理想汽车热管理数据分析工程师 - 北京
社招全职3年以上汽车研发地点:北京状态:招聘
任职要求
1. 本科及以上学历,计算机、软件工程、数据科学等相关专业,3 年及以上机器学习数据工程或 MLOps 实战经验; 2. 精通 Python 与 SQL,熟练使用 Pandas、PySpark,理解 Spark 执行原理及性能调优; 3. 熟悉 Scikit-learn、TensorFlow 或 PyTorch 等机器学习框架的训练与服务化管线; 4. 掌握 Hive、Iceberg、Delta Lake 等数据仓库 / 数据湖技术,具备数据建模与存储优化能力; 5. 了解 Hadoop、Kafka、Airflow / Kubeflow、Docke…
登录查看完整任职要求
微信扫码,1秒登录
工作职责
1. 负责数据预处理与特征工程流水线的开发与维护(缺失值、异常值处理、标准化、编码等),保障模型训练与线上推理数据的一致性; 2. 与算法团队协同,持续优化训练数据集构建逻辑,支持模型迭代、测试流程; 3. 建立数据质量监控体系,定位并分析数据问题根因,提出改进方案并推动数据源或处理流程优化落地; 4. 优化数据仓库 / 数据湖存储架构及 Spark 计算资源配置,提升数据查询和模型训练效率; 5. 作为部门与公司级大数据平台的接口人,参与平台需求评审、接口设计与系统集成,输出技术文档与规范; 6. 协助平台团队进行作业调度、性能调优与监控告警策略设计,确保数据链路稳定高效; 7. 参与制定数据处理与 MLOps 开发规范,提升流程自动化水平与团队协作效率。
包括英文材料
学历+
数据科学+
https://roadmap.sh/ai-data-scientist
Step by step roadmap guide to becoming an AI and Data Scientist
机器学习+
https://www.youtube.com/watch?v=0oyDqO8PjIg
Learn about machine learning and AI with this comprehensive 11-hour course from @LunarTech_ai.
https://www.youtube.com/watch?v=i_LwzRVP7bg
Learn Machine Learning in a way that is accessible to absolute beginners.
https://www.youtube.com/watch?v=NWONeJKn6kc
Learn the theory and practical application of machine learning concepts in this comprehensive course for beginners.
https://www.youtube.com/watch?v=PcbuKRNtCUc
Learn about all the most important concepts and terms related to machine learning and AI.
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
SQL+
https://liaoxuefeng.com/books/sql/introduction/index.html
什么是SQL?简单地说,SQL就是访问和处理关系数据库的计算机标准语言。
https://sqlbolt.com/
Learn SQL with simple, interactive exercises.
https://www.youtube.com/watch?v=p3qvj9hO_Bo
In this video we will cover everything you need to know about SQL in only 60 minutes.
Pandas+
[英文] 10 minutes to pandas
https://pandas.pydata.org/docs/user_guide/10min.html
This is a short introduction to pandas, geared mainly for new users.
[英文] Cookbook - pandas
https://pandas.pydata.org/docs/user_guide/cookbook.html#cookbook
This is a repository for short and sweet examples and links for useful pandas recipes.
https://www.kaggle.com/learn/pandas
Solve short hands-on challenges to perfect your data manipulation skills.
https://www.youtube.com/watch?v=2uvysYbKdjM
I'm super excited for this one. We're doing another complete Python Pandas tutorial walkthrough.
https://www.youtube.com/watch?v=Mdq1WWSdUtw
Filtering, Joins, Indexing, Data Cleaning, Visualizations
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
性能调优+
https://goperf.dev/
The Go App Optimization Guide is a series of in-depth, technical articles for developers who want to get more performance out of their Go code without relying on guesswork or cargo cult patterns.
https://web.dev/learn/performance
This course is designed for those new to web performance, a vital aspect of the user experience.
https://www.ibm.com/think/insights/application-performance-optimization
Application performance is not just a simple concern for most organizations; it’s a critical factor in their business’s success.
https://www.oreilly.com/library/view/optimizing-java/9781492039259/
Performance tuning is an experimental science, but that doesn’t mean engineers should resort to guesswork and folklore to get the job done.
Scikit-learn+
https://www.ibm.com/think/topics/scikit-learn
Scikit-learn, or sklearn, is an open source project and one of the most used machine learning (ML) libraries today.
https://www.youtube.com/watch?v=SIEaLBXr0rk
Today we to a crash course on Scikit-Learn, the go-to library in Python when it comes to traditional machine learning algorithms (i.e., not deep learning).
TensorFlow+
https://www.youtube.com/watch?v=tpCFfeUEGs8
Ready to learn the fundamentals of TensorFlow and deep learning with Python? Well, you’ve come to the right place.
https://www.youtube.com/watch?v=ZUKz4125WNI
This part continues right where part one left off so get that Google Colab window open and get ready to write plenty more TensorFlow code.
PyTorch+
https://datawhalechina.github.io/thorough-pytorch/
PyTorch是利用深度学习进行数据科学研究的重要工具,在灵活性、可读性和性能上都具备相当的优势,近年来已成为学术界实现深度学习算法最常用的框架。
https://www.youtube.com/watch?v=V_xro1bcAuA
Learn PyTorch for deep learning in this comprehensive course for beginners. PyTorch is a machine learning framework written in Python.
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
Iceberg+
https://iceberg.apache.org/spark-quickstart/
This guide will get you up and running with Apache Iceberg™ using Apache Spark™, including sample code to highlight some powerful features.
https://www.baeldung.com/apache-iceberg-intro
This tutorial will discuss Apache Iceberg, a popular open table format in today’s big data landscape.
https://www.youtube.com/watch?v=TsmhRZElPvM
You’ve probably heard about Apache Iceberg™—after all, it’s been getting a lot of buzz.
还有更多 •••
相关职位
校招热管理
1. 负责空调热管理智能算法的需求分析; 2. 负责空调热管理智能算法模型开发; 3. 负责空调热管理智能算法仿真环境搭建与维护; 4. 负责空调热管理智能算法集成落地; 5. 负责空调热管理大数据挖掘分析、优化算法; 6. 负责空调热管理算法问题跟踪、分析和解决。
上海
校招热管理
1. 负责空调热管理系统控制软件需求分析; 2. 负责空调热管理系统控制软件测试台架开发; 3. 负责空调热管理系统台架测试和实车测试,及测试用例开发; 4. 负责空调热管理系统的系统性能标定验证; 5. 负责空调热管理系统的系统及舒适性标定,制定舒适性标定流程; 6. 负责空调热管理系统的问题跟踪 分析和解决; 7. 负责空调热管理系统大数据挖掘 分析工作,优化系统功 性能表现; 8. 负责开通热管理性能标定成熟度管控。
上海
社招6年以上汽车研发
1. 根据整车空调热管理系统的功能与性能需求编制系统功能规范(电器原理图,引脚信号定义、通信信号定义、电气类产品规范参数); 2. 负责空调热管理系统控制软件功能算法、策略、功能需求定义开发; 3. 负责空调热管理系统软件设计说明文档功能校对; 4. 负责空调热管理系统诊断规范编写; 5. 负责空调热管理系统的各类问题跟踪、分析和解决; 6. 负责空调热管理系统大数据挖掘、分析工作,优化系统功性能表现; 7. 负责空调热管理系统的软件设计开发计划,并推动实施; 8. 负责收集空调热管理个性化、智能化前沿技术追踪、新技术开发; 9. 负责与整车各需求相关部门进行系统需求的沟通协调; 10. 该岗位可base北京/上海。
上海