字节跳动高级大数据开发工程师
社招全职A133877地点:北京状态:招聘
任职要求
1、熟悉数据仓库体系架构、数据建模方法、数据治理等知识; 2、对数据价值探索充满热情,较强的业务理解和抽象能力,能快速分析和理解问题; 3、有较强的SQL/ETL开发能力,掌握大数据技术栈,包括Hadoop/Hive/Spark/OLAP引擎等; 4、思维逻辑清晰,良好的自驱力、沟通能力和解决问题能力; 5、扎实的数据结构、数据库原理等基础知识; 6、具备平台治理业务经验&实时数据体系建设经验优先。
工作职责
1、以抖音系全域数据为依托,参与构建海量数据下符合安全生态业务特性的数据仓库建设; 2、负责数据模型的架构设计、开发以及海量数据下的性能调优,复杂业务场景下的需求交付; 3、参与构建围绕安全、质量、效率、成本等方向的数据管理能力建设,并推动某细分横向场景的落地; 4、深入业务,理解并合理抽象业务需求,发挥数据价值,与业务团队紧密合作。
包括英文材料
数据仓库+
https://www.youtube.com/watch?v=9GVqKuTVANE
From Zero to Data Warehouse Hero: A Full SQL Project Walkthrough and Real Industry Experience!
https://www.youtube.com/watch?v=k4tK2ttdSDg
数据治理+
https://www.ibm.com/think/topics/data-governance
Data governance is the data management discipline that focuses on the quality, security and availability of an organization’s data.
https://www.youtube.com/watch?v=uPsUjKLHLAg
Building data fabric eliminates the technological complexities of data governance so users can connect to the right data at the right time, regardless of where it resides.
SQL+
https://liaoxuefeng.com/books/sql/introduction/index.html
什么是SQL?简单地说,SQL就是访问和处理关系数据库的计算机标准语言。
https://sqlbolt.com/
Learn SQL with simple, interactive exercises.
https://www.youtube.com/watch?v=p3qvj9hO_Bo
In this video we will cover everything you need to know about SQL in only 60 minutes.
ETL+
https://www.ibm.com/think/topics/etl
ETL—meaning extract, transform, load—is a data integration process that combines, cleans and organizes data from multiple sources into a single, consistent data set for storage in a data warehouse, data lake or other target system.
https://www.youtube.com/watch?v=OW5OgsLpDCQ
It explains what ETL is and what it can do for you to improve your data analysis and productivity.
大数据+
https://www.youtube.com/watch?v=bAyrObl7TYE
https://www.youtube.com/watch?v=H4bf_uuMC-g
With all this talk of Big Data, we got Rebecca Tickle to explain just what makes data into Big Data.
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
OLAP+
https://www.youtube.com/watch?v=iw-5kFzIdgY
OLAP (for online analytical processing) is software for performing multidimensional analysis at high speeds on large volumes of data from a data warehouse, data mart, or some other unified, centralized data store.
数据结构+
https://www.youtube.com/watch?v=8hly31xKli0
In this course you will learn about algorithms and data structures, two of the fundamental topics in computer science.
https://www.youtube.com/watch?v=B31LgI4Y4DQ
Learn about data structures in this comprehensive course. We will be implementing these data structures in C or C++.
https://www.youtube.com/watch?v=CBYHwZcbD-s
Data Structures and Algorithms full course tutorial java
相关职位
社招2年以上CSIG技术
1.负责元宝业务的离线和实时数仓规划和建设,结合数据、技术与应用等多方特性,构建高可用、易扩展的数仓体系,高效满足业务用数诉求; 2.负责建立数仓与数据质量标准和规范,确定数据治理方案,并与内外部团队协作,推动落地实施,不断提升数据质量,确保数据及时、准确与稳定性; 3.不断优化数仓模型,抽象总结并沉淀通用方案与平台工具能力,提升研发与用户用数效率。
更新于 2025-08-01
社招信息技术类
1.承接实时/离线大数据处理流程开发,满足平台内业务数据需求。 2.对大数据服务进行性能调优,保障集群的高效与平稳运行,提升系统稳定性和可扩展性。 3.持续升级计算存储架构,更好支持业务发展。 工作
更新于 2025-06-28