字节跳动数据仓库开发工程师-视频与边缘
社招全职2年以上A103835A地点:北京状态:招聘
任职要求
1、计算机或相关专业本科及以上学历,2年以上数据工作经验; 2、熟悉并行计算或者分布式计算原理,熟悉高并发、高稳定性、可线性扩展、海量数据的系统特点和技术方案; 3、具有丰富的数据开发经验,对数据处理、数据建模、数据分析等有深刻认识和实战经验,熟悉Kimball维度建模方法论; 4、精通SQL,具备海量数据下的性能调优能力,熟练掌握Java或Python; 5、熟悉大数据相关工具/框架经验者优先,如:Hadoop、Hive、Spark、Kafka、Flink、Clickhouse Doris; 6、业务理解力强,对数据、新技术敏感,对云计算、大数据技术充满热情。
工作职责
1、负责多媒体网络音视频质量数据开发、调优、运维等工作,构建数据仓库体系; 2、负责数仓模型设计、ETL开发,海量数据下的性能调优,以及复杂业务场景下的需求交付; 3、参与数据治理,面对PB级存量数据和万亿条级别的新增数据量,提升数据易用性及数据质量,降低数据处理成本; 4、深入业务,理解并合理抽象业务需求,沉淀高质量体系化的数据资产,为业务赋能。
包括英文材料
学历+
高并发+
https://www.baeldung.com/concurrency-principles-patterns
In this tutorial, we’ll discuss some of the design principles and patterns that have been established over time to build highly concurrent applications.
https://www.baeldung.com/java-concurrency
Handling concurrency in an application can be a tricky process with many potential pitfalls. A solid grasp of the fundamentals will go a long way to help minimize these issues.
https://www.oreilly.com/library/view/concurrency-in-go/9781491941294/
You’ll understand how Go chooses to model concurrency, what issues arise from this model, and how you can compose primitives within this model to solve problems.
https://www.oreilly.com/library/view/modern-concurrency-in/9781098165406/
With this book, you'll explore the transformative world of Java 21's key feature: virtual threads.
https://www.youtube.com/watch?v=qyM8Pi1KiiM
https://www.youtube.com/watch?v=wEsPL50Uiyo
数据分析+
[英文] Data Analyst Roadmap
https://roadmap.sh/data-analyst
Step by step guide to becoming an Data Analyst in 2025
SQL+
https://liaoxuefeng.com/books/sql/introduction/index.html
什么是SQL?简单地说,SQL就是访问和处理关系数据库的计算机标准语言。
https://sqlbolt.com/
Learn SQL with simple, interactive exercises.
https://www.youtube.com/watch?v=p3qvj9hO_Bo
In this video we will cover everything you need to know about SQL in only 60 minutes.
性能调优+
https://goperf.dev/
The Go App Optimization Guide is a series of in-depth, technical articles for developers who want to get more performance out of their Go code without relying on guesswork or cargo cult patterns.
https://web.dev/learn/performance
This course is designed for those new to web performance, a vital aspect of the user experience.
https://www.ibm.com/think/insights/application-performance-optimization
Application performance is not just a simple concern for most organizations; it’s a critical factor in their business’s success.
https://www.oreilly.com/library/view/optimizing-java/9781492039259/
Performance tuning is an experimental science, but that doesn’t mean engineers should resort to guesswork and folklore to get the job done.
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
大数据+
https://www.youtube.com/watch?v=bAyrObl7TYE
https://www.youtube.com/watch?v=H4bf_uuMC-g
With all this talk of Big Data, we got Rebecca Tickle to explain just what makes data into Big Data.
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Kafka+
https://developer.confluent.io/what-is-apache-kafka/
https://www.youtube.com/watch?v=CU44hKLMg7k
https://www.youtube.com/watch?v=j4bqyAMMb7o&list=PLa7VYi0yPIH0KbnJQcMv5N9iW8HkZHztH
In this Apache Kafka fundamentals course, we introduce you to the basic Apache Kafka elements and APIs, as well as the broader Kafka ecosystem.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
ClickHouse+
[英文] Advanced Tutorial
https://clickhouse.com/docs/tutorial
Learn how to ingest and query data in ClickHouse using the New York City taxi example dataset.
https://www.youtube.com/watch?v=FtoWGT7kS-c
ClickHouse is an open-source column-oriented DBMS for online analytical processing that allows users to generate analytical reports using SQL queries in real-time.
https://www.youtube.com/watch?v=Rhe-kUyrFUE&list=PL0Z2YDlm0b3gcY5R_MUo4fT5bPqUQ66ep
Doris+
https://doris.apache.org/docs/gettingStarted/what-is-apache-doris
相关职位
社招2年以上A64928A
1、负责多媒体网络音视频质量数据开发、调优、运维等工作,构建数据仓库体系; 2、负责数仓模型设计、ETL开发,海量数据下的性能调优,以及复杂业务场景下的需求交付; 3、参与数据治理,面对PB级存量数据和万亿条级别的新增数据量,提升数据易用性及数据质量,降低数据处理成本; 4、深入业务,理解并合理抽象业务需求,沉淀高质量体系化的数据资产,为业务赋能。
更新于 2025-04-21
社招2年以上A220281
1、负责多媒体网络音视频质量数据开发、调优、运维等工作,构建数据仓库体系; 2、负责数仓模型设计、ETL开发,海量数据下的性能调优,以及复杂业务场景下的需求交付; 3、参与数据治理,面对PB级存量数据和万亿条级别的新增数据量,提升数据易用性及数据质量,降低数据处理成本; 4、深入业务,理解并合理抽象业务需求,沉淀高质量体系化的数据资产,为业务赋能。
更新于 2025-04-21
社招JDS8P
1、 负责头条和西瓜视频的离线与实时数据仓库的构建; 2、 负责数据模型的设计,ETL实施,ETL性能优化,ETL数据监控以及相关技术问题的解决; 3、 负责指标体系建设与维护; 4、 深入业务,理解并合理抽象业务需求,发挥数据价值,与业务团队紧密合作; 5、 参与大数据应用规划,为数据产品、业务团队提供应用指导; 6、 参与数据治理工作,提升数据易用性及数据质量。
更新于 2021-06-03