字节跳动大数据开发工程师-基础数仓方向
社招全职A79080A地点:北京状态:招聘
任职要求
1、计算机基础知识扎实,具备较好的coding和sql能力,能保质保量完成数据建设和分析工作; 2、乐于探索新技术和业界新问题的解决方案,学习主动性强; 3、熟悉大数据相关工具/框架经验者优先,e.g. Hadoop, Hive, Spark, Kafka, Flink, Clickhouse etc; 4、学习能力和沟通能力强,善于总结复盘者优先; 5、涉及跨国协作,需要具备英语读写能力;英语听说能力强者优先。
工作职责
1、负责公司基础数据的开发、调优、运维等工作,支持通用的数据需求; 2、参与数据治理,面对EB级存量数据和万亿条级别的新增数据量,提升数据易用性及数据质量,降低数据处理成本; 3、与开发工具、计算引擎、基础架构等底层组件团队紧密合作,探索、落地和推广降本提效的新技术方案; 4、理解并合理抽象业务需求,与业务团队紧密沟通,更好地发挥数据价值。
包括英文材料
SQL+
https://liaoxuefeng.com/books/sql/introduction/index.html
什么是SQL?简单地说,SQL就是访问和处理关系数据库的计算机标准语言。
https://sqlbolt.com/
Learn SQL with simple, interactive exercises.
https://www.youtube.com/watch?v=p3qvj9hO_Bo
In this video we will cover everything you need to know about SQL in only 60 minutes.
大数据+
https://www.youtube.com/watch?v=bAyrObl7TYE
https://www.youtube.com/watch?v=H4bf_uuMC-g
With all this talk of Big Data, we got Rebecca Tickle to explain just what makes data into Big Data.
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Kafka+
https://developer.confluent.io/what-is-apache-kafka/
https://www.youtube.com/watch?v=CU44hKLMg7k
https://www.youtube.com/watch?v=j4bqyAMMb7o&list=PLa7VYi0yPIH0KbnJQcMv5N9iW8HkZHztH
In this Apache Kafka fundamentals course, we introduce you to the basic Apache Kafka elements and APIs, as well as the broader Kafka ecosystem.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
ClickHouse+
[英文] Advanced Tutorial
https://clickhouse.com/docs/tutorial
Learn how to ingest and query data in ClickHouse using the New York City taxi example dataset.
https://www.youtube.com/watch?v=FtoWGT7kS-c
ClickHouse is an open-source column-oriented DBMS for online analytical processing that allows users to generate analytical reports using SQL queries in real-time.
https://www.youtube.com/watch?v=Rhe-kUyrFUE&list=PL0Z2YDlm0b3gcY5R_MUo4fT5bPqUQ66ep
相关职位

社招3年以上集团商业部
1.负责数据模型的设计、ETL实施、性能优化、ETL数据监控以及相关技术问题的解决; 2.面向广告业务方向,建设专题数据,基于数据仓库构建用户、业务核心标签、特征工程数据,与业务场景深度结合,为各业务线提供数据支持; 3.负责数据仓库体系的设计、构建和实现,数仓标准化分层体系建设工作,并沉淀企业级数据资产,助力提升支持业务的效率,探索数据的增量价值; 4.负责数据治理和管理体系,结合业务+元数据+技术,推进资源成本的优化,提高数据服务的数据质量,保障数据产出的稳定性。
更新于 2025-02-08

社招2年以上技术支持
1、负责离线与实时数据仓库的构建,负责数据模型的设计和开发; 2、负责公司基础数据的开发、调优、维护等工作; 3、负责游戏指标体系建设与维护; 4、深入业务,理解并合理抽象业务需求,发挥数据价值,与业务团队紧密合作; 5、参与数据治理工作,提升数据易用性、规范性和数据质量。
更新于 2025-03-31
社招5-7年数字技术
- 负责公司级核心数据资产的规划和建设,支撑核心业务场景设计和开发落地 - 深入各个业务领域,理解业务需求,带领和建设数仓团队,发挥数据对业务的价值 - 负责离线&实时数仓基础架构和落地,推进批流一体落地,对全生命周期的数据交付负责 - 参与数据产品与应用的数据研发,发掘数据商业价值,和产品技术团队一起打造极致体验的数据产品 - 为海量数据处理和分析提供高效解决方案,落地实时和离线需求,为基础开发者提供可靠技术支持 - 结合业务方向,深度挖掘数据需求,形成技术方案和标准,探索行业前沿技术
更新于 2025-07-25