理想汽车数据开发实习生
实习兼职车辆控制地点:北京状态:招聘
任职要求
1. 精通Java/Scala/Python至少一门语言(其中python必选),熟悉Linux开发环境与脚本编程; 2. 理解并掌握Hadoop生态(HDFS/YARN/MapReduce)及Spark/Flink计算引擎的开发应用,有性能调优经验; 3. 熟悉数据仓库建模理论(维度建模、分层设计),掌握Hive、HBase、Kafka等组件的原理与应用; 4. 熟练使用OLAP引擎(如Doris、ClickHouse)和SQL优化技巧; 5. 熟练使用主流可视化工具(Superset,Davinci等)设计开发数据看板; 6. 如能使用统计分析和机器学习技术对数据进行建模和分析者优先。
工作职责
1. 参与车控算法的RD和开发项目,负责车控算法数仓建立、数据集成的工作; 2. 设计和开发高效、可扩展的ETL数据管道,优化数据清洗、转换和加载流程; 3. 参与数据仓库(如Hive、ClickHouse)、实时数仓(如Flink、Kafka)的架构设计与开发; 4. 对接业务需求,开发数据服务接口,为数据分析、机器学习等场景提供高质量数据支持; 5. 解决大数据集群的性能瓶颈,调优Hadoop/Spark/Flink等框架的资源利用率与计算效率。
包括英文材料
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
Scala+
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
Linux+
https://ryanstutorials.net/linuxtutorial/
Ok, so you want to learn how to use the Bash command line interface (terminal) on Unix/Linux.
https://ubuntu.com/tutorials/command-line-for-beginners
The Linux command line is a text interface to your computer.
https://www.youtube.com/watch?v=6WatcfENsOU
In this Linux crash course, you will learn the fundamental skills and tools you need to become a proficient Linux system administrator.
https://www.youtube.com/watch?v=v392lEyM29A
Never fear the command line again, make it fear you.
https://www.youtube.com/watch?v=ZtqBQ68cfJc
脚本+
[英文] Scripting language
https://en.wikipedia.org/wiki/Scripting_language
https://zhuanlan.zhihu.com/p/571097954
一个脚本通常是解释执行而非编译。脚本语言通常都有简单、易学、易用的特性,目的就是希望能让程序员快速完成程序的编写工作。
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
HDFS+
https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware.
https://www.ibm.com/cn-zh/think/topics/hdfs
Hadoop 分布式文件系统 (HDFS) 是一种管理大型数据集的文件系统,可在商用硬件上运行。
Yarn+
[英文] Introduction
https://yarnpkg.com/getting-started
Yarn is an established open-source package manager used to manage dependencies in JavaScript projects.
MapReduce+
https://www.youtube.com/watch?v=bcjSe0xCHbE
https://www.youtube.com/watch?v=cHGaQz0E7AU
In this video I explain the basics of Map Reduce model, an important concept for any software engineer to be aware of.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
性能调优+
https://goperf.dev/
The Go App Optimization Guide is a series of in-depth, technical articles for developers who want to get more performance out of their Go code without relying on guesswork or cargo cult patterns.
https://web.dev/learn/performance
This course is designed for those new to web performance, a vital aspect of the user experience.
https://www.ibm.com/think/insights/application-performance-optimization
Application performance is not just a simple concern for most organizations; it’s a critical factor in their business’s success.
https://www.oreilly.com/library/view/optimizing-java/9781492039259/
Performance tuning is an experimental science, but that doesn’t mean engineers should resort to guesswork and folklore to get the job done.
数据仓库+
https://www.youtube.com/watch?v=9GVqKuTVANE
From Zero to Data Warehouse Hero: A Full SQL Project Walkthrough and Real Industry Experience!
https://www.youtube.com/watch?v=k4tK2ttdSDg
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
HBase+
[英文] HBase Tutorial
https://www.tutorialspoint.com/hbase/index.htm
HBase is a data model that is similar to Google's big table designed to provide quick random access to huge amounts of structured data. This tutorial provides an introduction to HBase, the procedures to set up HBase on Hadoop File Systems, and ways to interact with HBase shell.
Kafka+
https://developer.confluent.io/what-is-apache-kafka/
https://www.youtube.com/watch?v=CU44hKLMg7k
https://www.youtube.com/watch?v=j4bqyAMMb7o&list=PLa7VYi0yPIH0KbnJQcMv5N9iW8HkZHztH
In this Apache Kafka fundamentals course, we introduce you to the basic Apache Kafka elements and APIs, as well as the broader Kafka ecosystem.
OLAP+
https://www.youtube.com/watch?v=iw-5kFzIdgY
OLAP (for online analytical processing) is software for performing multidimensional analysis at high speeds on large volumes of data from a data warehouse, data mart, or some other unified, centralized data store.
Doris+
https://doris.apache.org/docs/gettingStarted/what-is-apache-doris
ClickHouse+
[英文] Advanced Tutorial
https://clickhouse.com/docs/tutorial
Learn how to ingest and query data in ClickHouse using the New York City taxi example dataset.
https://www.youtube.com/watch?v=FtoWGT7kS-c
ClickHouse is an open-source column-oriented DBMS for online analytical processing that allows users to generate analytical reports using SQL queries in real-time.
https://www.youtube.com/watch?v=Rhe-kUyrFUE&list=PL0Z2YDlm0b3gcY5R_MUo4fT5bPqUQ66ep
SQL+
https://liaoxuefeng.com/books/sql/introduction/index.html
什么是SQL?简单地说,SQL就是访问和处理关系数据库的计算机标准语言。
https://sqlbolt.com/
Learn SQL with simple, interactive exercises.
https://www.youtube.com/watch?v=p3qvj9hO_Bo
In this video we will cover everything you need to know about SQL in only 60 minutes.
机器学习+
https://www.youtube.com/watch?v=0oyDqO8PjIg
Learn about machine learning and AI with this comprehensive 11-hour course from @LunarTech_ai.
https://www.youtube.com/watch?v=i_LwzRVP7bg
Learn Machine Learning in a way that is accessible to absolute beginners.
https://www.youtube.com/watch?v=NWONeJKn6kc
Learn the theory and practical application of machine learning concepts in this comprehensive course for beginners.
https://www.youtube.com/watch?v=PcbuKRNtCUc
Learn about all the most important concepts and terms related to machine learning and AI.
相关职位
实习数据开发
1. 参与车辆数据平台数据仓库及数据应用服务的研发工作; 2. 参与实时、离线、流批一体数据仓库的建设,数据方案设计,模型开发,指标体系的开发和数据治理; 3. 支持业务团队的数据分析工作,负责面向业务的统计报表,数据提取等工作; 4. 参与解决项目中的问题和技术难题,线上疑难问题排查和解决; 5. 理解数据仓库架构,在项目实施的过程中,发现并解决各种维度/粒度的数据问题。
社招软件研发
作为蔚来汽车整车应用软件中心数据算法组的数据开发实习生,你的职责包括: 1. 基于智能网联汽车大数据,建立功能与业务分析框架,定量分析用户行为,推进整车应用软件的持续更新。 2. 参与团队平台化数字化建设,提供数据建模与开发支持。 3. 与产品经理、数据科学家深度合作,打造车-云-算法闭环生态。
更新于 2023-06-28

实习金山世游
岗位职责: 1、使用 Go 语言构建产品化的云原生高性能数据平台,支持日均上亿条的流式数据处理任务; 2、使用 Python 开发工具链和自动化脚本,提升数据开发、分析和运营的工作效率; 3、应用 FinOps 方法论,基于数据分析,解决云服务的成本分析和优化问题,建立资源使用效能评估体系; 4、与数据分析师协同工作,为复杂的分析场景和模型,提供可以落地的解决方案,提高分析工作的效率; 5、在排除故障和数据修复等领域,提出有建设性的解决方案,构建数据质量监控体系。
更新于 2025-08-12