字节跳动实时数据开发工程师
社招全职J6HX2地点:北京状态:招聘
任职要求
1、具备扎实的计算机科学功底、编程基础和数据结构算法基础,良好的工程素养,高效的问题解决能力; 2、精通主流大数据和流式数据处理技术,扎实的大数据和分布式系统建设经验,如Flink、Storm、Flume、Kafka等流式大数据计算及运维经验; 3、掌握数据仓库体系架构、数据建模方法、数据治理等知识;对数据价值探索充满热情,较强的业务理解和抽象能力; 4、掌握数据库知识,较强的SQL/ETL开发能力; 5、掌握ES/HBase/Druid/Doris等OLAP引擎者一种以上,掌握Java/Scala/Python等开发语言。
工作职责
1、负责部门用户增长业务的实时数据系统建设的规划、设计及落地,为业务团队提供高效的实时、离线数据支持; 2、负责部门业务中增长方向实时数据集市研发及实时数据指标开发; 3、负责实时数据计算和数据服务的性能优化建设,提供稳定的业务实时数据服务。
包括英文材料
数据结构+
https://www.youtube.com/watch?v=8hly31xKli0
In this course you will learn about algorithms and data structures, two of the fundamental topics in computer science.
https://www.youtube.com/watch?v=B31LgI4Y4DQ
Learn about data structures in this comprehensive course. We will be implementing these data structures in C or C++.
https://www.youtube.com/watch?v=CBYHwZcbD-s
Data Structures and Algorithms full course tutorial java
算法+
https://roadmap.sh/datastructures-and-algorithms
Step by step guide to learn Data Structures and Algorithms in 2025
https://www.hellointerview.com/learn/code
A visual guide to the most important patterns and approaches for the coding interview.
https://www.w3schools.com/dsa/
大数据+
https://www.youtube.com/watch?v=bAyrObl7TYE
https://www.youtube.com/watch?v=H4bf_uuMC-g
With all this talk of Big Data, we got Rebecca Tickle to explain just what makes data into Big Data.
分布式系统+
https://www.distributedsystemscourse.com/
The home page of a free online class in distributed systems.
https://www.youtube.com/watch?v=7VbL89mKK3M&list=PLOE1GTZ5ouRPbpTnrZ3Wqjamfwn_Q5Y9A
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
Kafka+
https://developer.confluent.io/what-is-apache-kafka/
https://www.youtube.com/watch?v=CU44hKLMg7k
https://www.youtube.com/watch?v=j4bqyAMMb7o&list=PLa7VYi0yPIH0KbnJQcMv5N9iW8HkZHztH
In this Apache Kafka fundamentals course, we introduce you to the basic Apache Kafka elements and APIs, as well as the broader Kafka ecosystem.
数据仓库+
https://www.youtube.com/watch?v=9GVqKuTVANE
From Zero to Data Warehouse Hero: A Full SQL Project Walkthrough and Real Industry Experience!
https://www.youtube.com/watch?v=k4tK2ttdSDg
数据治理+
https://www.ibm.com/think/topics/data-governance
Data governance is the data management discipline that focuses on the quality, security and availability of an organization’s data.
https://www.youtube.com/watch?v=uPsUjKLHLAg
Building data fabric eliminates the technological complexities of data governance so users can connect to the right data at the right time, regardless of where it resides.
SQL+
https://liaoxuefeng.com/books/sql/introduction/index.html
什么是SQL?简单地说,SQL就是访问和处理关系数据库的计算机标准语言。
https://sqlbolt.com/
Learn SQL with simple, interactive exercises.
https://www.youtube.com/watch?v=p3qvj9hO_Bo
In this video we will cover everything you need to know about SQL in only 60 minutes.
ETL+
https://www.ibm.com/think/topics/etl
ETL—meaning extract, transform, load—is a data integration process that combines, cleans and organizes data from multiple sources into a single, consistent data set for storage in a data warehouse, data lake or other target system.
https://www.youtube.com/watch?v=OW5OgsLpDCQ
It explains what ETL is and what it can do for you to improve your data analysis and productivity.
ElasticSearch+
https://www.youtube.com/watch?v=a4HBKEda_F8
Learn about Elasticsearch with this comprehensive course designed for beginners, featuring both theoretical concepts and hands-on applications using Python (though applicable to any programming language). The course is structured in two parts: first covering essential Elasticsearch fundamentals including index management, document storage, text analysis, pipeline creation, search functionality, and advanced features like semantic search and embeddings; followed by a practical section where you'll build a real-world website using Elasticsearch as a search engine, working with the Astronomy Picture of the Day (APOD) dataset to implement features such as data cleaning pipelines, tokenization, pagination, and aggregations.
HBase+
[英文] HBase Tutorial
https://www.tutorialspoint.com/hbase/index.htm
HBase is a data model that is similar to Google's big table designed to provide quick random access to huge amounts of structured data. This tutorial provides an introduction to HBase, the procedures to set up HBase on Hadoop File Systems, and ways to interact with HBase shell.
Doris+
https://doris.apache.org/docs/gettingStarted/what-is-apache-doris
OLAP+
https://www.youtube.com/watch?v=iw-5kFzIdgY
OLAP (for online analytical processing) is software for performing multidimensional analysis at high speeds on large volumes of data from a data warehouse, data mart, or some other unified, centralized data store.
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
Scala+
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
相关职位
社招A98746
1、参与离线与实时数据仓库的构建,支持国际化本地生活业务的发展; 2、深入业务,理解并合理抽象业务需求,并负责落地实施,与业务团队紧密合作,为业务提供数据解决方案; 3、参与数据模型的设计,ETL实施,ETL性能优化,ETL数据监控以及相关技术问题的解决; 4、参与大数据应用规划和支持,为数据产品、挖掘团队提供技术支持; 5、参与数据治理工作,提升数据易用性及数据质量。
更新于 2025-06-05

社招3年以上技术类
1. 负责离线/实时数据仓库的模型建设。 2. 支撑搜索、用户画像、圈人圈品等业务的数据需求,确保输出时效性和准确性。 3. 深入参与数据产品建设,为公司内外提供完善的数据解决方案。 4. 与数据分析团队合作,理解业务需求,提供合适的数据处理方案。 5. 参与数据治理工作,提升数据易用性及数据质量。
更新于 2025-05-12
社招3年以上A45360
1、负责研发效能平台的离线与实时数据仓库的构建; 2、负责效能分析洞察报表的建设; 3、深入业务,理解并合理抽象业务需求,发挥数据价值,与业务团队紧密合作; 4、与研发团队、PMO团队深度协作,致力于数据助力业务。
更新于 2025-04-22