字节跳动开源大数据引擎资深专家 - EMR/大数据云平台
社招全职3年以上A35623地点:北京状态:招聘
任职要求
1、计算机相关专业,3年以上大数据或数据库内核研发经验; 2、熟练使用 C/C++/Java/Rust 一种或多种编程语言; 3、熟悉开源大数据开源生态组件或同类商业化的分析型数据库; 4、熟悉大数据应用场景与架构,有超大规模Hadoop/Hive/Spark/Flink/Presto等引擎集群运维使用经验者优先; 5、有源码级优化经验或深入研究者优先,有开源社区贡献者优先。
工作职责
1、参与字节跳动EMR 开源大数据内核研发,打造极速数据分析新范式; 2、深度参与相关开源社区,助力大数据产品提升开源影响力; 3、协助客户处理生产业务中的海量数据,解决疑难问题,发挥数据价值; 4、构建EMR大数据引擎技术竞争力,打造业界领先的引擎生态平台。
包括英文材料
大数据+
https://www.youtube.com/watch?v=bAyrObl7TYE
https://www.youtube.com/watch?v=H4bf_uuMC-g
With all this talk of Big Data, we got Rebecca Tickle to explain just what makes data into Big Data.
内核+
https://www.youtube.com/watch?v=C43VxGZ_ugU
I rummage around the Linux kernel source and try to understand what makes computers do what they do.
https://www.youtube.com/watch?v=HNIg3TXfdX8&list=PLrGN1Qi7t67V-9uXzj4VSQCffntfvn42v
Learn how to develop your very own kernel from scratch in this programming series!
https://www.youtube.com/watch?v=JDfo2Lc7iLU
Denshi goes over a simple explanation of what computer kernels are and how they work, alonside what makes the Linux kernel any special.
C+
https://www.freecodecamp.org/chinese/news/the-c-beginners-handbook/
本手册遵循二八定律。你将在 20% 的时间内学习 80% 的 C 编程语言。
https://www.youtube.com/watch?v=87SH2Cn0s9A
https://www.youtube.com/watch?v=KJgsSFOSQv0
This course will give you a full introduction into all of the core concepts in the C programming language.
https://www.youtube.com/watch?v=PaPN51Mm5qQ
In this complete C programming course, Dr. Charles Severance (aka Dr. Chuck) will help you understand computer architecture and low-level programming with the help of the classic C Programming language book written by Brian Kernighan and Dennis Ritchie.
C+++
https://www.learncpp.com/
LearnCpp.com is a free website devoted to teaching you how to program in modern C++.
https://www.youtube.com/watch?v=ZzaPdXTrSb8
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
Rust+
https://www.youtube.com/watch?v=BpPEoZW5IiY
In this comprehensive Rust course for beginners, you will learn about the core concepts of the language and underlying mechanisms in theory.
https://www.youtube.com/watch?v=lzKeecy4OmQ
Full Rust 101 Crash Course for beginners.
https://www.youtube.com/watch?v=rQ_J9WH6CGk
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
Presto+
[英文] What is Presto?
https://prestodb.io/what-is-presto/
https://www.tutorialspoint.com/apache_presto/index.htm
相关职位
社招3年以上A96870A
1、负责Data+AI方向的ToB产品的演进规划、竞争力建设和研发,打造业内领先的产品; 2、负责计算和存储引擎内核的深度优化,领先开源社区和行业,给客户提供增量价值; 3、负责大模型数据处理框架的研发、数据处理算子的研发,引领市场需求; 4、与火山引擎方舟、机器学习平台等产品深度生态对接,打造AI全链路组合竞争力; 5、支撑客户的预训练、后训练、模型蒸馏、AI搜索、RAG、Agent等场景的数据需求。
更新于 2025-03-10
社招3年以上Y3055
1、负责面向客户建设有市场竞争力的大数据开发治理套件、EMR、LAS等大数据平台类产品,丰富火山引擎数据中台产品矩阵,洞察行业的机会点,持续挖掘产品的商业化潜力; 2、能够抽象各行业客户需求,并将需求转化为高品质的产品设计或解决方案; 3、与商业化销售/解决方案配合,对外部客户进行产品推介和宣讲; 4、制定产品的GTM策略和定价,保障产品在行业中的竞争力。
更新于 2022-05-13
社招3年以上诚云科技
1、负责阿里云开源大数据平台(Flink/EMR/Spark/StarRocks/ES/Hadoop/K8S)运维工作,包括可观测性链路、监控报警,故障应急及处置、SLA可用率度量提升等 2、研发大数据运维管控平台,通过自动化提升运维效率,包括交付&变更CICD、智能诊断定界等 3、落地AIOps智能运维,通过AI算法提升稳定性,包括异常检测、根因定位及基于大模型&智能体Agent运维落地等 4、负责稳定性架构设计及项目组织推动落地,包括基础架构云原生化、跨AZ高可用架构、产品可运维性架构演进等
更新于 2025-09-25