腾讯高级大数据开发工程师(数据仓库方向)
社招全职2年以上CSIG技术地点:深圳状态:招聘
任职要求
1.本科以上学历,具有 5年以上大数据研发经验,包括实时/离线数据处理、数据建模、ETL开发与设计、数据治理等,有数据分析、指标体系构建工作经验优先; 2.熟练掌握HiveSQL、Python、Scala等至少两种语言和工具,要求有实战经验; 3.掌握大数据相关技术, 比如iceberg、Spark、Flink、Hadoop、Hive的原理了解,要求有实战经验 ; 4.熟悉一门ClickHouse、StartRocks等OLAP引擎,了解系统原理,要求有实战经验; 5.对数据敏感,工作细致负责,具备良好的问题分析与解决能力; 6.具备较强的自我驱动力,有良好的团队合作精神和沟通能力,能适应快节奏的工作环境。
工作职责
1.负责元宝业务的离线和实时数仓规划和建设,结合数据、技术与应用等多方特性,构建高可用、易扩展的数仓体系,高效满足业务用数诉求; 2.负责建立数仓与数据质量标准和规范,确定数据治理方案,并与内外部团队协作,推动落地实施,不断提升数据质量,确保数据及时、准确与稳定性; 3.不断优化数仓模型,抽象总结并沉淀通用方案与平台工具能力,提升研发与用户用数效率。
包括英文材料
学历+
大数据+
https://www.youtube.com/watch?v=bAyrObl7TYE
https://www.youtube.com/watch?v=H4bf_uuMC-g
With all this talk of Big Data, we got Rebecca Tickle to explain just what makes data into Big Data.
ETL+
https://www.ibm.com/think/topics/etl
ETL—meaning extract, transform, load—is a data integration process that combines, cleans and organizes data from multiple sources into a single, consistent data set for storage in a data warehouse, data lake or other target system.
https://www.youtube.com/watch?v=OW5OgsLpDCQ
It explains what ETL is and what it can do for you to improve your data analysis and productivity.
数据治理+
https://www.ibm.com/think/topics/data-governance
Data governance is the data management discipline that focuses on the quality, security and availability of an organization’s data.
https://www.youtube.com/watch?v=uPsUjKLHLAg
Building data fabric eliminates the technological complexities of data governance so users can connect to the right data at the right time, regardless of where it resides.
数据分析+
[英文] Data Analyst Roadmap
https://roadmap.sh/data-analyst
Step by step guide to becoming an Data Analyst in 2025
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
Scala+
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
ClickHouse+
[英文] Advanced Tutorial
https://clickhouse.com/docs/tutorial
Learn how to ingest and query data in ClickHouse using the New York City taxi example dataset.
https://www.youtube.com/watch?v=FtoWGT7kS-c
ClickHouse is an open-source column-oriented DBMS for online analytical processing that allows users to generate analytical reports using SQL queries in real-time.
https://www.youtube.com/watch?v=Rhe-kUyrFUE&list=PL0Z2YDlm0b3gcY5R_MUo4fT5bPqUQ66ep
OLAP+
https://www.youtube.com/watch?v=iw-5kFzIdgY
OLAP (for online analytical processing) is software for performing multidimensional analysis at high speeds on large volumes of data from a data warehouse, data mart, or some other unified, centralized data store.
Iceberg+
https://iceberg.apache.org/spark-quickstart/
This guide will get you up and running with Apache Iceberg™ using Apache Spark™, including sample code to highlight some powerful features.
https://www.baeldung.com/apache-iceberg-intro
This tutorial will discuss Apache Iceberg, a popular open table format in today’s big data landscape.
https://www.youtube.com/watch?v=TsmhRZElPvM
You’ve probably heard about Apache Iceberg™—after all, it’s been getting a lot of buzz.
相关职位
社招A133877
1、以抖音系全域数据为依托,参与构建海量数据下符合安全生态业务特性的数据仓库建设; 2、负责数据模型的架构设计、开发以及海量数据下的性能调优,复杂业务场景下的需求交付; 3、参与构建围绕安全、质量、效率、成本等方向的数据管理能力建设,并推动某细分横向场景的落地; 4、深入业务,理解并合理抽象业务需求,发挥数据价值,与业务团队紧密合作。
更新于 2024-08-01
社招3-5年技术
1. 负责滴滴网约车核心业务的数据仓库搭建及开发, 进行领域数仓建模并持续优化,持续提升数据效率; 2. 负责抽象核心业务流程,沉淀好用的数据架构、通用的分析框架和数据应用产品; 3. 负责数据开发流程及架构优化,不断完善数据治理体系,持续提升数仓建设的质量; 4. 探索新技术应用,实现技术变革升级
更新于 2025-09-19