小米小米汽车-数据研发工程师-数据仓库方向
社招全职5年以上A145957地点:南京状态:招聘
任职要求
1.从事数据仓库领域工作至少5年以上,熟悉数据仓库模型设计方法论,并有实际模型设计及ETL开发经验; 2.熟悉Hadoop生态相关技术,如Hive、HBase、Spark、Flink、Elasticsearch、Iceberg、Trino、Doris等,有基于分布式数据存储与计算平台应用开发经验; 3.熟悉数据仓库领域知识和管理技能,包括但不局限于:元数据管理、数据质量、…
登录查看完整任职要求
微信扫码,1秒登录
工作职责
1.参与多个传统/新能源汽车的数据项目研发; 2.参与数据仓库架构设计与数据集市建设,通过数据+算法+工程化能力,处理和萃取数据,赋能业务与产品,建设EB级共享的数据平台; 3.负责数据平台相关数据管理和管理工作,如研发规范、质量规范、保障规范的制定与推动实施落地,元数据管理、数据质量检查、数据分级管理等系统的设计、开发及应用,提升数据易用性、可用性及稳定性; 4.快速输出并不断沉淀标准化的产品数据体系,让业务的数据化运营更加高效、便捷; 5.负责来自业务团队数据需求的研发支撑,如日志埋点、车联网数据、内部与外部数据的采集、数据同步、数据清洗与标准化、数据模型设计、离线数据处理、实时数据处理、数据服务化、数据可视化等;
包括英文材料
数据仓库+
https://www.youtube.com/watch?v=9GVqKuTVANE
From Zero to Data Warehouse Hero: A Full SQL Project Walkthrough and Real Industry Experience!
https://www.youtube.com/watch?v=k4tK2ttdSDg
ETL+
https://www.ibm.com/think/topics/etl
ETL—meaning extract, transform, load—is a data integration process that combines, cleans and organizes data from multiple sources into a single, consistent data set for storage in a data warehouse, data lake or other target system.
https://www.youtube.com/watch?v=OW5OgsLpDCQ
It explains what ETL is and what it can do for you to improve your data analysis and productivity.
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
HBase+
[英文] HBase Tutorial
https://www.tutorialspoint.com/hbase/index.htm
HBase is a data model that is similar to Google's big table designed to provide quick random access to huge amounts of structured data. This tutorial provides an introduction to HBase, the procedures to set up HBase on Hadoop File Systems, and ways to interact with HBase shell.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
ElasticSearch+
https://www.youtube.com/watch?v=a4HBKEda_F8
Learn about Elasticsearch with this comprehensive course designed for beginners, featuring both theoretical concepts and hands-on applications using Python (though applicable to any programming language). The course is structured in two parts: first covering essential Elasticsearch fundamentals including index management, document storage, text analysis, pipeline creation, search functionality, and advanced features like semantic search and embeddings; followed by a practical section where you'll build a real-world website using Elasticsearch as a search engine, working with the Astronomy Picture of the Day (APOD) dataset to implement features such as data cleaning pipelines, tokenization, pagination, and aggregations.
还有更多 •••
相关职位
社招5-10年D6264
1、建设全站的基础数据能力,提供丰富、稳定的短视频社区公共基础数据,探索更多数据能力的增量价值; 2、支持运营方向各类数据专题体系的建设,通过数据+算法+产品,赋能业务,提供全链路、可分析、可复用的数据能力,提供更直观、更具分析指导性的产品化能力; 3、建设公司层面的核心数据资产,与业务场景深度结合,为社区服务提供数据服务化、数据业务化的数据&产品解决方案; 4、建设全站数据治理和管理体系,结合业务+元数据+技术,保障公司各个业务服务的数据质量和产出稳定。
更新于 2025-09-29北京
社招2年以上
1、负责淘用户增长、互动、手淘端产品等多个完整业务场景的数据建模与分析,通过科学的指标体系定义和探索性数据分析,准确描述业务现状,快速发现和定位各方向的问题与机会,从而科学指导业务决策; 2、深入理解电商业务、科学决策,支撑算法&策略持续优化创新,助力可持续的快速增长; 3、将复杂问题进行拆解、定义,利用维度建模等构建数据中间层与数据集市,能根据业务需求完成较为深入的专项数据分析与洞察,透过数据发现/预测/驱动产品体验优化及业务发展
更新于 2025-07-11杭州
社招3年以上A180726
1、定义并执行广告数据仓库公共层建模、架构设计和路线图; 2、推动横向项目和关键业务迭代的设计和实现,确保高质量交付; 3、负责生产流水线的稳定性,针对数据链路在运行过程中出现的问题,设计和实现必要的机制和工具,如数据质量保证、容灾等,确保线上数据链路的整体稳定。
更新于 2025-03-19上海