
哈啰数据闭环工程师-【自动驾驶】
社招全职技术地点:上海状态:招聘
任职要求
1. 本科及以上学历,专业方向包括计算机、软件、自动化、模式识别等理工科,工作经验不小于2年; 2. 具备较强的编程能力,精通Python及C++语言,熟悉Linux操作指令及SQL数据库,有良好的编程习惯和标准化的代码风格; 3. 熟悉自驾数据工厂的流程体系,理解数据工厂各项功能的用途,具备一定的数据工厂的产品定义能力; 4. 熟悉自动驾驶涉及到的各类传感器的原理及数据特点,具备主要传感器数据的处理经验,熟悉H264、YUV等常见的编解码方式,加分项:有定制数据格式的经验; 5. 熟悉常见的自动驾驶中间件,包括不限于ROS2、DDS、some/ip、ZMQ等。加分项:有中间件开发及在SoC上的部署经验; 6. 熟悉 TCP/IP 网络协议及相关编程,熟悉进程间通信及多线程编程; 7. 熟悉数据自动打标原理及开发流程体系,加分项:有NN模型开发、部署经验; 8. 踏实靠谱,具备良好的团队内及跨团队沟通能力。
工作职责
1. 负责公司数据闭环体系搭建,处理亿级数据量,为端到端模型训练提供数据产线支持; 2. 主导数据闭环工具链研发,加速自动驾驶模型迭代,职责涉及以下方向: 2.1 数据筛选器:开发车端热更新筛选器和影子模式,设计并实现触发逻辑,打通云端配置到车端链路; 2.2 数据录制及上云:车端数据录制,包括原始数据封装、压缩、加偏、脱敏、加密等,打通车云上传链路; 2.3 云端数据处理:对于标注数据、场景数据,开发数据清洗、编解码、自动打标、高价值数据挖掘、数据检索、评测等自动化工具,适配自动驾驶模型数据pipeline. 3. 基于云资源和图商合规云体系的数据闭环:基于云上资源构建标注、训练平台,并完成智驾算法的适配对接和数据pipeline开发。
包括英文材料
学历+
模式识别+
https://www.mathworks.com/discovery/pattern-recognition.html
Pattern recognition is the process of classifying input data into objects, classes, or categories using computer algorithms based on key features or regularities.
https://www.microsoft.com/en-us/research/wp-content/uploads/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf
Pattern recognition has its origins in engineering, whereas machine learning grew out of computer science.
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
C+++
https://www.learncpp.com/
LearnCpp.com is a free website devoted to teaching you how to program in modern C++.
https://www.youtube.com/watch?v=ZzaPdXTrSb8
Linux+
https://ryanstutorials.net/linuxtutorial/
Ok, so you want to learn how to use the Bash command line interface (terminal) on Unix/Linux.
https://ubuntu.com/tutorials/command-line-for-beginners
The Linux command line is a text interface to your computer.
https://www.youtube.com/watch?v=6WatcfENsOU
In this Linux crash course, you will learn the fundamental skills and tools you need to become a proficient Linux system administrator.
https://www.youtube.com/watch?v=v392lEyM29A
Never fear the command line again, make it fear you.
https://www.youtube.com/watch?v=ZtqBQ68cfJc
SQL+
https://liaoxuefeng.com/books/sql/introduction/index.html
什么是SQL?简单地说,SQL就是访问和处理关系数据库的计算机标准语言。
https://sqlbolt.com/
Learn SQL with simple, interactive exercises.
https://www.youtube.com/watch?v=p3qvj9hO_Bo
In this video we will cover everything you need to know about SQL in only 60 minutes.
编程规范+
[英文] Google Style Guides
https://google.github.io/styleguide/
Every major open-source project has its own style guide: a set of conventions (sometimes arbitrary) about how to write code for that project. It is much easier to understand a large codebase when all the code in it is in a consistent style.
自动驾驶+
https://www.youtube.com/watch?v=_q4WUxgwDeg&list=PL05umP7R6ij321zzKXK6XCQXAaaYjQbzr
Lecture: Self-Driving Cars (Prof. Andreas Geiger, University of Tübingen)
https://www.youtube.com/watch?v=NkI9ia2cLhc&list=PLB0Tybl0UNfYoJE7ZwsBQoDIG4YN9ptyY
You will learn to make a self-driving car simulation by implementing every component one by one. I will teach you how to implement the car driving mechanics, how to define the environment, how to simulate some sensors, how to detect collisions and how to make the car control itself using a neural network.
中间件+
https://www.youtube.com/watch?v=1oWPUpMheGk
SOC+
https://www.arm.com/resources/education/books/modern-soc
The aim of this textbook is to expose aspiring and practising SoC designers to the fundamentals and latest developments in SoC design and technologies using examples of Arm Cortex-A technology and related IP blocks and interfaces.
https://www.arm.com/resources/education/education-kits/introduction-to-soc
To produce students with solid introductory knowledge on the basics of SoC design and key practical skills required to implement a simple SoC on an FPGA and write embedded programs targeted at the microprocessor to control the peripherals.
https://www.youtube.com/watch?v=dokgLSAhqHI
A key part of the digital innovation revolution has been the embrace of the SoC, or system-on-chip.
TCP/IP+
[英文] What is TCP/IP?
https://www.techtarget.com/searchnetworking/definition/TCP-IP
TCP/IP stands for Transmission Control Protocol/Internet Protocol and is a suite of communication protocols used to interconnect network devices on the internet.
多线程+
https://liaoxuefeng.com/books/java/threading/basic/index.html
和单线程相比,多线程编程的特点在于:多线程经常需要读写共享数据,并且需要同步。
https://www.youtube.com/watch?v=_uQgGS_VIXM&list=PLsc-VaxfZl4do3Etp_xQ0aQBoC-x5BIgJ
https://www.youtube.com/watch?v=IEEhzQoKtQU
https://www.youtube.com/watch?v=mTGdtC9f4EU&list=PLL8woMHwr36EDxjUoCzboZjedsnhLP1j4
https://www.youtube.com/watch?v=TPVH_coGAQs&list=PLk6CEY9XxSIAeK-EAh3hB4fgNvYkYmghp
https://www.youtube.com/watch?v=xPqnoB2hjjA
This video is an introduction to multithreading in modern C++.
https://www.youtube.com/watch?v=YKBwKy5PrpQ
Rust threading is easy to implement and improves the efficiency of your applications on multi-core systems!
相关职位
社招A222944
1. 负责自动驾驶大规模数据挖掘及数据生成平台的设计与研发,高效支撑算法数据需求; 2. 负责自驾数据生成平台的设计与开发,生成高质量数据,用于云端仿真评测和模型训练; 3. 负责自动驾驶算法评测平台的设计与研发,驱动算法高效迭代,保障算法发版质量。
更新于 2024-08-16

社招3年以上数据挖掘
1、负责自动驾驶数据闭环核心系统的设计与开发,构建从车端数据采集到云端模型迭代的全链路自动化体系 2、优化车端影子模式(Shadow Mode)系统,设计高效的触发式数据采集策略,精准捕获有价值的 Hard Event 数据 3、开发智能数据筛选器(Filter),通过规则引擎与 AI 模型相结合的方式,从海量路测数据中高效筛选高价值训练样本 4、搭建云端自动化数据挖掘平台,实现异常场景、算法失效案例的自动发现与聚类分析 5、推动自动标注系统的迭代优化,自动化评估驾驶行为质量,自动化清洗坏行为数据,提升标注效率与质量 6、与EBM团队深度协作,将数据闭环产出转化为模型性能的持续提升 7、构建数据质量评估体系,确保训练数据的准确性、多样性与代表性
更新于 2025-09-19
社招技术
1)设计并实现高并发、高可用的数据标注平台后端架构,支持图像、点云、视频等多模态自动驾驶数据的标注需求; 2)搭建标注数据存储与治理系统,保障大规模标注数据(PB级)的安全存储与高效检索; 3)搭建面向自动驾驶大模型的数据生产系统,涵盖数据清洗、增强、合成与标注全流程; 4)构建数据版本管理与回溯系统,实现数据集迭代过程的可追踪性,满足模型训练合规要求;
更新于 2025-08-22