小鹏汽车大数据开发工程师(数仓方向)-上海
社招全职地点:上海状态:招聘
任职要求
1.本科及以上学历,计算机类相关专业,有不错的数据仓库开发经验,能写高质量的SQL代码 2.熟悉大数据生态环境,掌握Hadoop,Hive,Kafka,Spark,Flink,Redis,ElasticSearch等大数据技术栈 3.优秀的编程和调试能力,精通至少一种主流编程语言, 如Java,Python,Go 4.了解并且使用过StarRocks/Doris等高性能OLAP查询系统,PowerBI、QuickBI等BI报表工具 5.有快速学习能力,能快速理解业务背景,善于沟通,主动性强,有责任心,具备优秀的技术与业务结合能力
工作职责
1.负责自动驾驶业务数据的数据采集、清洗、转换和加载(ETL)流程,构建和维护车端信号数据仓库体系和数据指标体系 2.支持C端用户和B端分析的各种数据需求 3.参与数据治理工作(如数据质量核查、元数据管理等) 建立监控和反馈指标,持续优化改进产品的架构及性能,保证PB级数仓的数据质量和平台稳定性
包括英文材料
学历+
数据仓库+
https://www.youtube.com/watch?v=9GVqKuTVANE
From Zero to Data Warehouse Hero: A Full SQL Project Walkthrough and Real Industry Experience!
https://www.youtube.com/watch?v=k4tK2ttdSDg
SQL+
https://liaoxuefeng.com/books/sql/introduction/index.html
什么是SQL?简单地说,SQL就是访问和处理关系数据库的计算机标准语言。
https://sqlbolt.com/
Learn SQL with simple, interactive exercises.
https://www.youtube.com/watch?v=p3qvj9hO_Bo
In this video we will cover everything you need to know about SQL in only 60 minutes.
大数据+
https://www.youtube.com/watch?v=bAyrObl7TYE
https://www.youtube.com/watch?v=H4bf_uuMC-g
With all this talk of Big Data, we got Rebecca Tickle to explain just what makes data into Big Data.
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
Kafka+
https://developer.confluent.io/what-is-apache-kafka/
https://www.youtube.com/watch?v=CU44hKLMg7k
https://www.youtube.com/watch?v=j4bqyAMMb7o&list=PLa7VYi0yPIH0KbnJQcMv5N9iW8HkZHztH
In this Apache Kafka fundamentals course, we introduce you to the basic Apache Kafka elements and APIs, as well as the broader Kafka ecosystem.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
Redis+
[英文] Developer Hub
https://redis.io/dev/
Get all the tutorials, learning paths, and more you need to start building—fast.
https://www.runoob.com/redis/redis-tutorial.html
REmote DIctionary Server(Redis) 是一个由 Salvatore Sanfilippo 写的 key-value 存储系统,是跨平台的非关系型数据库。
https://www.youtube.com/watch?v=jgpVdJB2sKQ
In this video I will be covering Redis in depth from how to install it, what commands you can use, all the way to how to use it in a real world project.
ElasticSearch+
https://www.youtube.com/watch?v=a4HBKEda_F8
Learn about Elasticsearch with this comprehensive course designed for beginners, featuring both theoretical concepts and hands-on applications using Python (though applicable to any programming language). The course is structured in two parts: first covering essential Elasticsearch fundamentals including index management, document storage, text analysis, pipeline creation, search functionality, and advanced features like semantic search and embeddings; followed by a practical section where you'll build a real-world website using Elasticsearch as a search engine, working with the Astronomy Picture of the Day (APOD) dataset to implement features such as data cleaning pipelines, tokenization, pagination, and aggregations.
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
Go+
https://www.youtube.com/watch?v=8uiZC0l4Ajw
学习Golang的完整教程!从开始到结束不到一个小时,包括如何在Go中构建API的完整演示。没有多余的内容,只有你需要知道的知识。
StarRocks+
https://docs.starrocks.io/docs/quick_start/
These Quick Start guides will help you get going with a small StarRocks environment.
https://itnext.io/introduction-to-starrocks-a-new-modern-analytical-database-1db2177d26e1
Recently, I had the opportunity to explore StarRocks which is the new kid in the block when talking about massive scale databases which are able to handle petabytes of data.
Doris+
https://doris.apache.org/docs/gettingStarted/what-is-apache-doris
OLAP+
https://www.youtube.com/watch?v=iw-5kFzIdgY
OLAP (for online analytical processing) is software for performing multidimensional analysis at high speeds on large volumes of data from a data warehouse, data mart, or some other unified, centralized data store.
Power BI+
[英文] Power BI Tutorial
https://www.tutorialspoint.com/power_bi/index.htm
Power BI is a Data Visualization and Business Intelligence tool that converts data from different data sources to interactive dashboards and BI reports.
https://www.youtube.com/watch?v=FwjaHCVNBWA
相关职位

社招2年以上技术支持
1、负责离线与实时数据仓库的构建,负责数据模型的设计和开发; 2、负责公司基础数据的开发、调优、维护等工作; 3、负责游戏指标体系建设与维护; 4、深入业务,理解并合理抽象业务需求,发挥数据价值,与业务团队紧密合作; 5、参与数据治理工作,提升数据易用性、规范性和数据质量。
更新于 2025-03-31
校招程序技术类
1.负责数据收集、清洗、转换和处理(ETL)流程的设计,开发和实施。 2.参与数据体系的技术调研、验证。 3.监控数据质量,确保数据的准确性和完整性。 4.参与数据仓库的建设、模型的梳理和建立。
社招4年以上核心本地商业-基
1.承担美团服务零售业务线的数仓设计和开发工作; 2.承担业务方应用层数据的搭建和开发工作; 3.承担服务零售业务数据质量、成本、安全等各方向数据治理工作; 4.业务方数据问题的统一接口人与综合解决方案提供方,对外提供一站式服务; 5.跨团队沟通、推动数据生产链路上的问题改进。
更新于 2025-04-03