快手资深数仓架构师
社招全职5年以上D0599地点:北京状态:招聘
任职要求
1、计算机相关专业,5年以上互联网行业数仓架构设计经验; 2、熟悉Hadoop、Spark、Flink、Kafka、HDFS、Hive等主流大数据系统原理及调优; 3、对数据采集、数据开发、数据分析、数据建模等有深刻认识和实战经验; 4、在大数据架构、产品、技术等方面有深入思考和中长期技术视野,对大数据架构演进有清晰、成熟的思路; 5、熟悉数据治理领域知识和管理技能,包括但不限于:元数据管理、数据质量管理、数据标准管理、数据生命周期管理、数据安全管理等; 6、良好的团队合作精神,具备高效协同工作能力。
工作职责
1、结合快手业务特性,从全链路视角,规划合理的数仓架构,包括公共基础数据和主题数据,以及相应的流程机制,并持续迭代; 2、与前后台团队协同,通过有效且合理的评估机制,推动数仓架构的治理,并在各个业务线带来效率、成本和质量等方面的提升; 3、具备较强的业务理解能力与合作共赢意识,在数仓架构治理中能够良好协同上下游,确保数据链路的稳定与高效。
包括英文材料
系统设计+
https://roadmap.sh/system-design
Everything you need to know about designing large scale systems.
https://www.youtube.com/watch?v=F2FmTdLtb_4
This complete system design tutorial covers scalability, reliability, data handling, and high-level architecture with clear explanations, real-world examples, and practical strategies.
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
Kafka+
https://developer.confluent.io/what-is-apache-kafka/
https://www.youtube.com/watch?v=CU44hKLMg7k
https://www.youtube.com/watch?v=j4bqyAMMb7o&list=PLa7VYi0yPIH0KbnJQcMv5N9iW8HkZHztH
In this Apache Kafka fundamentals course, we introduce you to the basic Apache Kafka elements and APIs, as well as the broader Kafka ecosystem.
HDFS+
https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware.
https://www.ibm.com/cn-zh/think/topics/hdfs
Hadoop 分布式文件系统 (HDFS) 是一种管理大型数据集的文件系统,可在商用硬件上运行。
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
数据分析+
[英文] Data Analyst Roadmap
https://roadmap.sh/data-analyst
Step by step guide to becoming an Data Analyst in 2025
大数据+
https://www.youtube.com/watch?v=bAyrObl7TYE
https://www.youtube.com/watch?v=H4bf_uuMC-g
With all this talk of Big Data, we got Rebecca Tickle to explain just what makes data into Big Data.
数据治理+
https://www.ibm.com/think/topics/data-governance
Data governance is the data management discipline that focuses on the quality, security and availability of an organization’s data.
https://www.youtube.com/watch?v=uPsUjKLHLAg
Building data fabric eliminates the technological complexities of data governance so users can connect to the right data at the right time, regardless of where it resides.
相关职位
社招5年以上CDG技术
1.负责业务主题模型体系搭建,基于概念模型-逻辑模型-物理模型方法论实现企业级数据建模,建立标准化、可扩展的数据模型体系; 2.设计符合支付行业特性的主题域划分方案,建立标准化、可扩展的主题模型框架; 3.主导实时数据仓库体系架构设计与开发,构建毫秒级响应的实时数据处理平台; 4.与AI团队深度协作,将主题模型与机器学习模型进行有机融合; 5.攻克高并发场景下的实时数据建模难题,支持风险感知、案件分析等智能决策场景。
更新于 2025-05-27
社招3年以上网易云音乐
1、负责音乐离线数据仓库的研发,通过合理的数据架构,保障内外数据的准确性、一致性和稳定性,包括数据清洗、模型设计、数据治理及稳定性保障; 2、深入理解业务,通过对业务策略的洞察,收敛业务数据需求,提供系统性的解决方案并落地; 3、与数据分析师合作推动数据为产品运营赋能,通过技术创新让数据为业务发展带来价值。
更新于 2025-03-12
社招3年以上网易云音乐
1、负责音乐离线数据仓库的研发,通过合理的数据架构,保障内外数据的准确性、一致性和稳定性,包括数据清洗、模型设计、数据治理及稳定性保障; 2、深入理解业务,通过对业务策略的洞察,收敛业务数据需求,提供系统性的解决方案并落地; 3、与数据分析师合作推动数据为产品运营赋能,通过技术创新让数据为业务发展带来价值。
更新于 2024-12-04