字节跳动数据仓库工程师
社招全职JVVJP地点:北京状态:招聘
任职要求
1、精通数据仓库实施方法论、深入了解数据仓库体系,并支撑过实际业务场景; 2、具备较强的编码能力,熟悉sql,python,hive,spark,kafka,flink中的多项,有至少TB以上级大数据处理经验; 3、熟练掌握批计算相关技术栈,了解流计算相关技术,有HTAP/HSAP相关实践经验优先; 4、善于沟通,具备优秀的技术与业务结合能力。
工作职责
1、负责字节跳动旗下幸福里/今日头条/游戏业务/西瓜视频/搜索业务/小说/审核/搜索等一个或多个业务线的数据仓库架构设计、建模和ETL开发; 2、参与数据治理工作,提升数据易用性及数据质量,与数据工具团队紧密合作; 3、理解并合理抽象业务需求,发挥数据价值,与业务团队紧密合作。
包括英文材料
数据仓库+
https://www.youtube.com/watch?v=9GVqKuTVANE
From Zero to Data Warehouse Hero: A Full SQL Project Walkthrough and Real Industry Experience!
https://www.youtube.com/watch?v=k4tK2ttdSDg
SQL+
https://liaoxuefeng.com/books/sql/introduction/index.html
什么是SQL?简单地说,SQL就是访问和处理关系数据库的计算机标准语言。
https://sqlbolt.com/
Learn SQL with simple, interactive exercises.
https://www.youtube.com/watch?v=p3qvj9hO_Bo
In this video we will cover everything you need to know about SQL in only 60 minutes.
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Kafka+
https://developer.confluent.io/what-is-apache-kafka/
https://www.youtube.com/watch?v=CU44hKLMg7k
https://www.youtube.com/watch?v=j4bqyAMMb7o&list=PLa7VYi0yPIH0KbnJQcMv5N9iW8HkZHztH
In this Apache Kafka fundamentals course, we introduce you to the basic Apache Kafka elements and APIs, as well as the broader Kafka ecosystem.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
相关职位
社招2年以上
1. 基于阿里云DataWorks平台,完成数据采集、清洗、加工、建模及报表开发,支撑复杂业务场景的数据需求; 2. 深入理解业务逻辑,与业务部门高效沟通,将模糊需求转化为清晰的数据开发方案; 3. 负责ETL任务开发与维护,确保数据流程的准确性、稳定性与时效性; 4. 管理数据血缘关系,维护数据文档(如数据字典、处理逻辑说明),保障数据可追溯性; 5. 分析数据异常问题,快速定位原因并推动解决,确保业务决策数据的可靠性; 6. 协助优化数据使用流程,提升业务方数据获取和分析效率。
更新于 2025-05-28
社招5年以上旅游业务AI &
1.参与离线和实时数据仓库架构设计和开发,构建高效、稳定、可扩展的数据仓库系统。 2.负责数据仓库模型设计,包括星型模型、雪花模型、星座模型等,并制定数据仓库开发规范。 3.负责数据资产的运营,结合业务现状,解决数据资产生产、消费过程中的卡点,提升相关团队的用数体验 4.开发和维护 ETL/ELT 数据 pipeline,确保数据高效、准确地从源系统加载到数据仓库 5.优化数据仓库性能,解决数据查询和数据加载的性能瓶颈。
更新于 2025-03-06
社招住宿业务AI &
1、负责离线和在线数据的采集、清洗和加载; 2、负责通过专项分析,输出专项分析报告,为业务决策和监控提供数据支持; 3、负责携程大量商户/用户数据的分析和提炼。
更新于 2025-06-17