理想汽车数据开发实习生
实习兼职车辆控制地点:北京状态:招聘
任职要求
1. 精通Java/Scala/Python至少一门语言(其中python必选),熟悉Linux开发环境与脚本编程; 2. 理解并掌握Hadoop生态(HDFS/YARN/MapReduce)及Spark/Flink计算引擎的开发应用,有性能调优经验; 3. 熟悉数据仓库建模理论(维度建模、分层设…
登录查看完整任职要求
微信扫码,1秒登录
工作职责
1. 参与车控算法的RD和开发项目,负责车控算法数仓建立、数据集成的工作; 2. 设计和开发高效、可扩展的ETL数据管道,优化数据清洗、转换和加载流程; 3. 参与数据仓库(如Hive、ClickHouse)、实时数仓(如Flink、Kafka)的架构设计与开发; 4. 对接业务需求,开发数据服务接口,为数据分析、机器学习等场景提供高质量数据支持; 5. 解决大数据集群的性能瓶颈,调优Hadoop/Spark/Flink等框架的资源利用率与计算效率。
包括英文材料
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
Scala+
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
Linux+
https://ryanstutorials.net/linuxtutorial/
Ok, so you want to learn how to use the Bash command line interface (terminal) on Unix/Linux.
https://ubuntu.com/tutorials/command-line-for-beginners
The Linux command line is a text interface to your computer.
https://www.youtube.com/watch?v=6WatcfENsOU
In this Linux crash course, you will learn the fundamental skills and tools you need to become a proficient Linux system administrator.
https://www.youtube.com/watch?v=v392lEyM29A
Never fear the command line again, make it fear you.
https://www.youtube.com/watch?v=ZtqBQ68cfJc
脚本+
[英文] Scripting language
https://en.wikipedia.org/wiki/Scripting_language
https://zhuanlan.zhihu.com/p/571097954
一个脚本通常是解释执行而非编译。脚本语言通常都有简单、易学、易用的特性,目的就是希望能让程序员快速完成程序的编写工作。
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
HDFS+
https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware.
https://www.ibm.com/cn-zh/think/topics/hdfs
Hadoop 分布式文件系统 (HDFS) 是一种管理大型数据集的文件系统,可在商用硬件上运行。
Yarn+
[英文] Introduction
https://yarnpkg.com/getting-started
Yarn is an established open-source package manager used to manage dependencies in JavaScript projects.
MapReduce+
https://www.youtube.com/watch?v=bcjSe0xCHbE
https://www.youtube.com/watch?v=cHGaQz0E7AU
In this video I explain the basics of Map Reduce model, an important concept for any software engineer to be aware of.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
还有更多 •••
相关职位
实习D13739
1、负责快手商业化品牌营销相关平台大数据研发工作,建设面向广告主和达人的品牌资产平台,沉淀用户核心数据资产;构建业界领先的CDP商业化数据建设; 2、参与数据仓库的架构、规划与落地; 3、保障离线、实时数据及时、稳定产出,保障数据质量; 4、参与相关提效工具建设,持续提升数据开发效率。
更新于 2025-07-17北京
实习D12319
1、参与快手大数据体系的设计与建设,通过数据仓库、元数据、数据管理等体系,管理和建设几千P的数据; 2、参与商业化业务数据专题体系的建设,通过对数据的建设和应用理解,支持各类的业务管理决策和业务运营,结合自己的商业sense,发掘数据的业务价值; 3、有很好的团队氛围,徜徉在世界领先的大数据处理和应用技术的海洋中。
更新于 2025-04-01北京
实习小度科技
-负责构建大数据离线和实时流分析平台和工具 -参与海量数据的存储、查询 -参与支撑业务的数据模型建设及数据指标的计算 -运用Hadoop、Spark、ES等分布式计算和存储平台
更新于 2024-04-19北京