小红书Spark数据引擎研发专家
社招全职3-5年数据引擎地点:上海 | 北京 | 杭州状态:招聘
任职要求
1、至少熟悉一款主流大数据框架及源码,比如Spark、StarRocks、ClickHouse、Impala、Doris、MySQL等; 2、熟悉Spark生态系统,如RSS,Kuybbi, HMS等,有实际的Spark 代码编写、调优部署和问题排查经验。 3、熟悉主流的OLAP引擎的技术优势,如向量化执行、SIMD编程、列式存储、并行编程、异步编程、查询编译等; 4、精通C++/Java编程语言,对K8S,元数据有一定的了解和使用经验。 加分项:a. 熟悉Velox、CK等任意一项向量化算子实现者可加分 b. 熟悉Spark on K8S 并有实际部署经验者可加分 c. 了解包括推广搜在内的算法数据工程链路,并有实际治理经验者可加分
工作职责
1、参与小红书Spark离线引擎的研发工作,支撑小红书云原生大规模离线数据处理场景,包括数据仓库、机器学习等场景,提升离线处理引擎的性能和稳定性 2、参与小红书Spark Native Engine 和 Serverless Spark 架构的研发工作,提升任务时效性,同时利用离在线混部降低资源成本 3、参与统一元数据工作,为小红书算法AI团队管理非结构化数据,提供统一访问方式,简化算法数据开发链路,并进行数据治理
包括英文材料
大数据+
https://www.youtube.com/watch?v=bAyrObl7TYE
https://www.youtube.com/watch?v=H4bf_uuMC-g
With all this talk of Big Data, we got Rebecca Tickle to explain just what makes data into Big Data.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
StarRocks+
https://docs.starrocks.io/docs/quick_start/
These Quick Start guides will help you get going with a small StarRocks environment.
https://itnext.io/introduction-to-starrocks-a-new-modern-analytical-database-1db2177d26e1
Recently, I had the opportunity to explore StarRocks which is the new kid in the block when talking about massive scale databases which are able to handle petabytes of data.
ClickHouse+
[英文] Advanced Tutorial
https://clickhouse.com/docs/tutorial
Learn how to ingest and query data in ClickHouse using the New York City taxi example dataset.
https://www.youtube.com/watch?v=FtoWGT7kS-c
ClickHouse is an open-source column-oriented DBMS for online analytical processing that allows users to generate analytical reports using SQL queries in real-time.
https://www.youtube.com/watch?v=Rhe-kUyrFUE&list=PL0Z2YDlm0b3gcY5R_MUo4fT5bPqUQ66ep
Impala+
[英文] Impala Tutorials
https://impala.apache.org/docs/build/html/topics/impala_tutorial.html
This section includes tutorial scenarios that demonstrate how to begin using Impala.
Doris+
https://doris.apache.org/docs/gettingStarted/what-is-apache-doris
MySQL+
https://juejin.cn/post/7190306988939542585
这是一篇 MySQL 通关一篇过硬核经验学习路线,包括数据库相关知识,SQL语句的使用,数据库约束,设计等。
[英文] MySQL Tutorial
https://www.mysqltutorial.org/
your go-to resource for mastering MySQL in a fast, easy, and enjoyable way.
https://www.youtube.com/watch?v=5OdVJbNCSso
MySQL SQL tutorial for beginners
https://www.youtube.com/watch?v=7S_tz1z_5bA
This beginner-friendly course teaches you SQL from scratch.
OLAP+
https://www.youtube.com/watch?v=iw-5kFzIdgY
OLAP (for online analytical processing) is software for performing multidimensional analysis at high speeds on large volumes of data from a data warehouse, data mart, or some other unified, centralized data store.
C+++
https://www.learncpp.com/
LearnCpp.com is a free website devoted to teaching you how to program in modern C++.
https://www.youtube.com/watch?v=ZzaPdXTrSb8
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
Kubernetes+
https://kubernetes.io/docs/tutorials/kubernetes-basics/
This tutorial provides a walkthrough of the basics of the Kubernetes cluster orchestration system.
https://kubernetes.io/zh-cn/docs/tutorials/kubernetes-basics/
本教程介绍 Kubernetes 集群编排系统的基础知识。每个模块包含关于 Kubernetes 主要特性和概念的一些背景信息,还包括一个在线教程供你学习。
https://www.youtube.com/watch?v=s_o8dwzRlu4
Hands-On Kubernetes Tutorial | Learn Kubernetes in 1 Hour - Kubernetes Course for Beginners
https://www.youtube.com/watch?v=X48VuDVv0do
Full Kubernetes Tutorial | Kubernetes Course | Hands-on course with a lot of demos
C+
https://www.freecodecamp.org/chinese/news/the-c-beginners-handbook/
本手册遵循二八定律。你将在 20% 的时间内学习 80% 的 C 编程语言。
https://www.youtube.com/watch?v=87SH2Cn0s9A
https://www.youtube.com/watch?v=KJgsSFOSQv0
This course will give you a full introduction into all of the core concepts in the C programming language.
https://www.youtube.com/watch?v=PaPN51Mm5qQ
In this complete C programming course, Dr. Charles Severance (aka Dr. Chuck) will help you understand computer architecture and low-level programming with the help of the classic C Programming language book written by Brian Kernighan and Dennis Ritchie.
算法+
https://roadmap.sh/datastructures-and-algorithms
Step by step guide to learn Data Structures and Algorithms in 2025
https://www.hellointerview.com/learn/code
A visual guide to the most important patterns and approaches for the coding interview.
https://www.w3schools.com/dsa/
相关职位

社招2年以上技术类
1、【引擎研发】负责Spark、Presto、Hive 为基础的大数据查询引擎内核研发,跟进社区版本,改进性能,提升稳定性,研发新功能,修复内核BUG; 2、【业务支撑】负责排查、定位、解决生产集群问题,与运维同学一起维护生产集群的稳定性,协助业务方一起使用好大数据平台; 3、【平台规划】参与规划公司计算平台的技术演技,提升计算平台湖仓能力,基于云IAAS或者自建IAAS,打造高稳定性、高性能、低成本的计算平台。
更新于 2023-12-26
社招D7195
1、参与快手EB级大数据平台计算引擎相关系统的研发与优化工作,解决实际业务需求与性能问题; 2、接受大数据平台系统设计与实现复杂度的挑战,分析和发现系统的优化点,负责推动系统的合理性、可靠性、可用性的提升; 3、和开源社区保持交流,从社区引入对公司业务场景有帮助的特性与系统,或将内部研发的功能贡献到社区。
更新于 2025-03-07
社招D7195
1、参与快手EB级大数据平台计算引擎相关系统的研发与优化工作,解决实际业务需求与性能问题; 2、接受大数据平台系统设计与实现复杂度的挑战,分析和发现系统的优化点,负责推动系统的合理性、可靠性、可用性的提升; 3、和开源社区保持交流,从社区引入对公司业务场景有帮助的特性与系统,或将内部研发的功能贡献到社区。
更新于 2025-03-07