字节跳动大数据开发工程师-流式计算方向
社招全职66D2地点:上海状态:招聘
任职要求
1、对流式计算系统有深入的了解,在生产环境有TB级别Flink实时计算系统开发经验,深入掌握Flink DataStream、FlinkSQL、Flink Checkpoint、Flink State等模块,有Flink源码阅读经验优先; 2、熟悉常见消息队列原理和应用调优,有Kafka、Plusar、RocketMQ等项目源码阅读经验优先; 3、熟悉Java、C++、Scala、Python等编程语言,有强悍的编码和 trouble-shooting 能力; 4、乐于挑战没有明显答案的问题,对新技术有强烈的学习热情,有PB级别数据处理经验加分; 5、有数据湖开发经验,熟悉Hudi、Iceberg、DeltaLake等至少一项数据湖技术,有源码阅读经验优先; 6、熟悉其他大数据系统经验者优先,YARN、K8S、Spark、SparkSQL、Kudu等; 7、有存储系统经验加分,HBase、Casscandra、RocksDB等。
工作职责
字节跳动推荐架构团队,负责字节跳动旗下相关产品的推荐系统架构的设计和开发,保障系统稳定和高可用;负责在线服务、离线数据流性能优化,解决系统瓶颈,降低成本开销;抽象系统通用组件和服务,建设推荐中台、数据中台,支撑新产品快速孵化以及为ToB赋能。 1、为大规模推荐系统设计和实现合理的实时(流式计算)数据系统; 2、设计和实现灵活可扩展、稳定、高性能的存储系统和计算模型; 3、生产系统的trouble-shooting,设计和实现必要的机制和工具保障生产系统整体运行的稳定性; 4、打造业界领先的流式计算框架等分布式系统,为海量数据和大规模业务系统提供可靠的基础设施。
包括英文材料
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
消息队列+
https://www.youtube.com/watch?v=xErwDaOc-Gs
Kafka+
https://developer.confluent.io/what-is-apache-kafka/
https://www.youtube.com/watch?v=CU44hKLMg7k
https://www.youtube.com/watch?v=j4bqyAMMb7o&list=PLa7VYi0yPIH0KbnJQcMv5N9iW8HkZHztH
In this Apache Kafka fundamentals course, we introduce you to the basic Apache Kafka elements and APIs, as well as the broader Kafka ecosystem.
RocketMQ+
https://www.baeldung.com/apache-rocketmq-spring-boot
In this tutorial, we’ll create a message producer and consumer using Spring Boot and Apache RocketMQ, an open-source distributed messaging and streaming data platform.
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
C+++
https://www.learncpp.com/
LearnCpp.com is a free website devoted to teaching you how to program in modern C++.
https://www.youtube.com/watch?v=ZzaPdXTrSb8
Scala+
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
Kubernetes+
https://kubernetes.io/docs/tutorials/kubernetes-basics/
This tutorial provides a walkthrough of the basics of the Kubernetes cluster orchestration system.
https://kubernetes.io/zh-cn/docs/tutorials/kubernetes-basics/
本教程介绍 Kubernetes 集群编排系统的基础知识。每个模块包含关于 Kubernetes 主要特性和概念的一些背景信息,还包括一个在线教程供你学习。
https://www.youtube.com/watch?v=s_o8dwzRlu4
Hands-On Kubernetes Tutorial | Learn Kubernetes in 1 Hour - Kubernetes Course for Beginners
https://www.youtube.com/watch?v=X48VuDVv0do
Full Kubernetes Tutorial | Kubernetes Course | Hands-on course with a lot of demos
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Kudu+
https://kudu.apache.org/docs/quickstart.html
HBase+
[英文] HBase Tutorial
https://www.tutorialspoint.com/hbase/index.htm
HBase is a data model that is similar to Google's big table designed to provide quick random access to huge amounts of structured data. This tutorial provides an introduction to HBase, the procedures to set up HBase on Hadoop File Systems, and ways to interact with HBase shell.
RocksDB+
https://rocksdb.org/docs/getting-started.html
The RocksDB library provides a persistent key value store.
相关职位
社招A11239
1、为大规模推荐系统设计和实现合理的实时(流式计算)数据系统; 2、设计和实现灵活可扩展、稳定、高性能的存储系统和计算模型; 3、生产系统的Trouble-shooting,设计和实现必要的机制和工具保障生产系统整体运行的稳定性; 4、打造业界领先的流式计算框架等分布式系统,为海量数据和大规模业务系统提供可靠的基础设施。
更新于 2024-01-29
社招A259456A
团队介绍:字节跳动推荐架构团队,负责字节跳动超10亿用户产品推荐系统架构的设计和开发,保障系统稳定和高可用;负责在线服务、离线数据流性能优化,解决系统瓶颈,降低成本开销;抽象系统通用组件和服务,建设推荐中台、数据中台,支撑新产品快速孵化以及为ToB赋能;实现灵活可扩展的高性能存储系统和计算模型,打通离在线数据流,构建统一的数据中台,支持推荐/搜索/广告。 1、为大规模推荐系统设计和实现合理的离线/实时数据架构,打造业界领先的离在线存储、批式流式计算框架等分布式系统,为海量数据和大规模业务系统提供可靠的平台化基础设施; 2、深入推荐系统,探索数据架构如何为业务赋能,提升线上效果; 3、尝试打破现有边界,探索核心框架的演进、新技术的应用、推荐大模型的落地; 4、生产系统的TROUBLE-SHOOTING和成本优化,设计和实现必要的机制和工具保障生产系统整体运行的稳定性与效率。
更新于 2025-05-13
社招X9WV
1、为大规模推荐系统设计和实现合理的流式计算系统; 2、设计和实现灵活可扩展、稳定、高性能存储系统和计算模型; 3、生产系统的Trouble-shooting,设计和实现必要的机制和工具保障生产系统稳定运行; 4、打造业界领先的流式计算框架等分布式系统,为海量数据和大规模业务系统提供可靠的基础设施。
更新于 2021-12-31