滴滴资深/专家数仓开发工程师(J250812021)
社招全职5年以上技术地点:北京状态:招聘
任职要求
1、本科及以上学历,计算机、金融、数学等相关专业,5年以上工作经验; 2、熟悉金融业务,精通数据仓库建模和ETL实施方法论,具备PB级金融数据仓库设计、实施工作经验,能够对数据仓库架构进行优化提效; 3、熟悉Hadoop大数据生态圈技术组件,熟悉Hive/Flink/Python/MR/Spark等技术,具备互联网大数据环境数据仓库工作经验; 4、熟悉数据标准、数据质量、数据指标、数据成本管控等数据资产管理和数据治理技术,且具备数据治理实践经验; 5、具有较强的执行推动能力和沟通协作能力,能够解决业务痛点和数据痛点,发挥数据资产价值。
工作职责
1、能够独立负责金融某一业务板块实时数据仓库与离线数据仓库的需求管理、架构设计、模型建设和数据研发工作,保证数据服务的稳定性和准确性; 2、能够对数据仓库团队初/中级人员在数据仓库建模、数据治理、金融业务等方向进行培养; 3、能够通过数据资产治理、数据需求交付时效提升等方式实现数据仓库工作的降本提效。 4、能够与上下游紧密协作,为金融商业分析、业务决策、业务运营、数据产品等提供有效数据支撑,对业务赋能; 5、能够与行业先进的数据技术对标,采用最佳技术实践解决业务数据需求痛点。
包括英文材料
学历+
数据仓库+
https://www.youtube.com/watch?v=9GVqKuTVANE
From Zero to Data Warehouse Hero: A Full SQL Project Walkthrough and Real Industry Experience!
https://www.youtube.com/watch?v=k4tK2ttdSDg
ETL+
https://www.ibm.com/think/topics/etl
ETL—meaning extract, transform, load—is a data integration process that combines, cleans and organizes data from multiple sources into a single, consistent data set for storage in a data warehouse, data lake or other target system.
https://www.youtube.com/watch?v=OW5OgsLpDCQ
It explains what ETL is and what it can do for you to improve your data analysis and productivity.
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
大数据+
https://www.youtube.com/watch?v=bAyrObl7TYE
https://www.youtube.com/watch?v=H4bf_uuMC-g
With all this talk of Big Data, we got Rebecca Tickle to explain just what makes data into Big Data.
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
MapReduce+
https://www.youtube.com/watch?v=bcjSe0xCHbE
https://www.youtube.com/watch?v=cHGaQz0E7AU
In this video I explain the basics of Map Reduce model, an important concept for any software engineer to be aware of.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
数据治理+
https://www.ibm.com/think/topics/data-governance
Data governance is the data management discipline that focuses on the quality, security and availability of an organization’s data.
https://www.youtube.com/watch?v=uPsUjKLHLAg
Building data fabric eliminates the technological complexities of data governance so users can connect to the right data at the right time, regardless of where it resides.
相关职位
社招5年以上技术团队AI &
1.负责离线和实时数据仓库各层(如ODS、DWD、DWS、ADS)的模型设计、开发与优化,支撑数据分析和业务应用; 2.负责集团财务数据分析系统的数仓开发及报表开发; 3.独立完成复杂业务逻辑的数据ETL开发、任务调度与运维监控,保障数据加工流程的准确性和稳定性; 4.建立并监控数据质量规则,主动发现、跟踪并解决数据质量问题,确保数据的可靠性和可信度。
更新于 2025-09-08
社招5年以上
1.深挖数据价值,构建和维护车端信号数据仓库体系和数据指标体系,为算法和数据闭环提供框架支持; 2.参与构建批流统一的数据分析平台,支持百亿级自动驾驶感知和全栈数据的快速定位和分析; 3.参与平台架构规划,负责前沿技术的跟踪研究,工具链的选型测试,解决、攻克数据平台的核心技术难题; 4.建立监控和反馈指标,持续优化改进产品的架构及性能,保证PB级数仓的数据质量和平台稳定性。
更新于 2025-05-14
社招5年以上数据类-商业数据
1. 全面呈现业务经营表现,对业务状态有综合判断,提供有效的预测或预警,辅助CEO制定公司经营决策; 2. 对行业和竞争有独立的思考与判断,洞察行业发展潜力和机会点,在战略规划上发挥积极影响; 3. 在完善数仓建设和治理基础上,完善经营分析指标体系,实现业务变化趋势的常规化洞察; 4. 通过专题分析,对业务问题进行深入分析,为公司运营决策、产品方向、业务策略提供数据支持; 5. 能够影响和提升核心输入、输出指标的项目,沉淀分析思路与框架, 提炼数据产品需求,与相关团队(如技术开发团队) 协作并推动数据产品的落地。
更新于 2025-07-03