字节跳动数据仓库开发工程师
社招全职3年以上JKYE1地点:北京状态:招聘
任职要求
1、计算机相关专业本科及以上学历; 2、3年及以上数据仓库研发经验,具有从0到1搭建企业级数仓的经验; 3、扎实的数据仓库理论基础,熟悉维度数据仓库模型设计,应用层建设有比较丰富的经验,具有Data Vault建模实际经验优先; 4、具备海量数据加工处理(ETL)及运维相关经验,灵活运用各类SQL实现数据ETL加工处理; 5、熟悉Hadoop生态相关技术并有相关实践经验,具备hsql、spark-sql调优经验; 6、至少熟练使用Shell、Python等脚本语言之一; 7、良好的逻辑思维和沟通能力,对代码和设计质量有严格要求,重视Code Review,知道良好的编程习惯的标准; 8、具有互联网金融行业从业经验、熟悉信贷、支付业务经验者优先。
工作职责
1、主导或参与财经业务分布式数据仓库的搭建与运营; 2、主导或参与企业数据资产公共层建设,从工具和效果上实现敏捷智能的目标; 3、深入了解业务,从数据治理层面发现业务和系统方面的问题,实现数据治理闭环,保障数据质效。
包括英文材料
学历+
数据仓库+
https://www.youtube.com/watch?v=9GVqKuTVANE
From Zero to Data Warehouse Hero: A Full SQL Project Walkthrough and Real Industry Experience!
https://www.youtube.com/watch?v=k4tK2ttdSDg
Vault+
[英文] Tutorials | Vault
https://developer.hashicorp.com/vault/tutorials
Centrally store, access and deploy secrets
https://www.youtube.com/watch?v=klyAhaklGNU
Full HashiCorp Vault Tutorial explaining What is HashiCorp Vault, How Vault works, Vault Architecture
ETL+
https://www.ibm.com/think/topics/etl
ETL—meaning extract, transform, load—is a data integration process that combines, cleans and organizes data from multiple sources into a single, consistent data set for storage in a data warehouse, data lake or other target system.
https://www.youtube.com/watch?v=OW5OgsLpDCQ
It explains what ETL is and what it can do for you to improve your data analysis and productivity.
SQL+
https://liaoxuefeng.com/books/sql/introduction/index.html
什么是SQL?简单地说,SQL就是访问和处理关系数据库的计算机标准语言。
https://sqlbolt.com/
Learn SQL with simple, interactive exercises.
https://www.youtube.com/watch?v=p3qvj9hO_Bo
In this video we will cover everything you need to know about SQL in only 60 minutes.
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Bash+
[英文] The Bash Guide
https://guide.bash.academy/
A quality-driven guide through the shell's many features.
https://www.youtube.com/watch?v=tK9Oc6AEnR4
Understanding how to use bash scripting will enhance your productivity by automating tasks, streamlining processes, and making your workflow more efficient.
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
脚本+
[英文] Scripting language
https://en.wikipedia.org/wiki/Scripting_language
https://zhuanlan.zhihu.com/p/571097954
一个脚本通常是解释执行而非编译。脚本语言通常都有简单、易学、易用的特性,目的就是希望能让程序员快速完成程序的编写工作。
编程规范+
[英文] Google Style Guides
https://google.github.io/styleguide/
Every major open-source project has its own style guide: a set of conventions (sometimes arbitrary) about how to write code for that project. It is much easier to understand a large codebase when all the code in it is in a consistent style.
相关职位
社招3年以上D6254
1、负责流量公共数据团队下离线数仓建设 或 实时数据体系的架构设计与开发落地; 2、对数据系统和数据服务的性能和稳定性进行持续优化迭代; 3、深入业务,理解并合理抽象业务诉求,发挥数据价值,与业务团队紧密合作; 4、打造行业领先的流量领域数据仓库体系,发挥数据价值。
更新于 2025-03-07
社招3年以上A221722
1、负责飞书People产品线核心业务离线&实时数据仓库构建; 2、负责维度模型的设计和大数据开发,解决数据任务性能优化、质量提升等技术问题; 3、负责打通不同业务线数据内容,形成统一数据模型; 4、负责全产品线数据治理,提升数据资产质量。
更新于 2024-01-17
社招2年以上A64928A
1、负责多媒体网络音视频质量数据开发、调优、运维等工作,构建数据仓库体系; 2、负责数仓模型设计、ETL开发,海量数据下的性能调优,以及复杂业务场景下的需求交付; 3、参与数据治理,面对PB级存量数据和万亿条级别的新增数据量,提升数据易用性及数据质量,降低数据处理成本; 4、深入业务,理解并合理抽象业务需求,沉淀高质量体系化的数据资产,为业务赋能。
更新于 2025-04-21