蔚来软件工程师(硬件加速方向)
社招全职3-5年算法地点:合肥 | 上海状态:招聘
任职要求
1. 计算机、自动化、电子工程、机器人相关专业,硕士及以上学历(优秀者可放宽至本科); 2. 熟练掌握 C/C++ 和 Python,具备良好的算法优化和代码工程能力; 3. 精通 CUDA、OpenCL ,有 GPU 或DSP加速优化经验,熟悉并行计算原理; 4. 了解 TensorRT、TVM、XLA、oneDNN(MKL-DNN)等推理加速库,有算子级优化经验者优先; 5. 有良好的数学基础,掌握线性代数、数值优化、概率统计等相关知识; 加分项 1. 熟悉 ROS、ROS2、Apollo 等机器人开发框架; 2. 具备 NPU、DSP、FPGA 端侧优化经验或低功耗AI计算优化能力者优先; 3. 有智能辅助驾驶等相关领域经验; 4. 有地平线RDK板子开发和部署经验这是极大的加分项; 5. 曾在NeurIPS、ICRA、CVPR、ICLR发表过高性能计算或机器人算法优化相关论文;
工作职责
1. 智能设备算法加速和模型板端部署:优化SLAM(同步定位与建图)、视觉感知、路径规划、运动控制等核心算法,提高实时性和计算效率; 2. 硬件加速优化:基于 GPU(CUDA)、BPU(Horizon RDK)、NPU、FPGA 等硬件加速器,实现高效并行计算,优化推理和训练速度; 3. 深度学习优化:针对目标检测、语义分割、三维点云处理等任务,使用 TensorRT、TVM、oneDNN 等框架进行推理加速。 4. 算子优化:基于 TensorFlow、PyTorch、ROS 等生态,优化自定义算子,提高计算图执行效率。 5. 系统集成:与机器人软件、硬件团队协作,确保优化后的算法可无缝集成,并满足实时性和功耗需求。 6. 跨平台开发,负责智慧工厂存量&增量设备系统端应用开发;
包括英文材料
学历+
C+
https://www.freecodecamp.org/chinese/news/the-c-beginners-handbook/
本手册遵循二八定律。你将在 20% 的时间内学习 80% 的 C 编程语言。
https://www.youtube.com/watch?v=87SH2Cn0s9A
https://www.youtube.com/watch?v=KJgsSFOSQv0
This course will give you a full introduction into all of the core concepts in the C programming language.
https://www.youtube.com/watch?v=PaPN51Mm5qQ
In this complete C programming course, Dr. Charles Severance (aka Dr. Chuck) will help you understand computer architecture and low-level programming with the help of the classic C Programming language book written by Brian Kernighan and Dennis Ritchie.
C+++
https://www.learncpp.com/
LearnCpp.com is a free website devoted to teaching you how to program in modern C++.
https://www.youtube.com/watch?v=ZzaPdXTrSb8
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
算法+
https://roadmap.sh/datastructures-and-algorithms
Step by step guide to learn Data Structures and Algorithms in 2025
https://www.hellointerview.com/learn/code
A visual guide to the most important patterns and approaches for the coding interview.
https://www.w3schools.com/dsa/
CUDA+
https://developer.nvidia.com/blog/even-easier-introduction-cuda/
This post is a super simple introduction to CUDA, the popular parallel computing platform and programming model from NVIDIA.
https://www.youtube.com/watch?v=86FAWCzIe_4
Lean how to program with Nvidia CUDA and leverage GPUs for high-performance computing and deep learning.
OpenCL+
https://developer.nvidia.com/opencl
OpenCL™ (Open Computing Language) is a low-level API for heterogeneous computing that runs on CUDA-powered GPUs.
https://engineering.purdue.edu/~smidkiff/ece563/NVidiaGPUTeachingToolkit/Mod20OpenCL/3rd-Edition-AppendixA-intro-to-OpenCL.pdf
we will give a brief overview of OpenCL for CUDA programers.
[英文] Hands On OpenCL
https://handsonopencl.github.io/
An open source two-day lecture course for teaching and learning OpenCL
https://leonardoaraujosantos.gitbook.io/opencl/chapter1
Open Computing Language is a framework for writing programs that execute across heterogeneous platforms.
https://ulhpc-tutorials.readthedocs.io/en/latest/gpu/opencl/
OpenCL came as a standard for heterogeneous programming that enables a code to run in different platforms.
https://www.youtube.com/watch?v=4q9fPOI-x80
This presentation will show how to make use of the GPU from Java using OpenCL.
TensorRT+
https://docs.nvidia.com/deeplearning/tensorrt/latest/getting-started/quick-start-guide.html
This TensorRT Quick Start Guide is a starting point for developers who want to try out the TensorRT SDK; specifically, it demonstrates how to quickly construct an application to run inference on a TensorRT engine.
ROS+
https://www.youtube.com/watch?v=92Zz5nnd41c&list=PLk51HrKSBQ8-jTgD0qgRp1vmQeVSJ5SQC
https://www.youtube.com/watch?v=HJAE5Pk8Nyw
Ready to learn ROS2 and take your robotics skills to the next level?
https://www.youtube.com/watch?v=MWKnMPX0Yjg&list=PLU9tksFlQRircAdEplrH9NMm4WtSA8yzi
Do you want to know more about ROS the Robot Operating System?
开发框架+
[英文] Understanding Modern Development Frameworks: A Guide for Developers and Technical Decision-makers
https://www.freecodecamp.org/news/understanding-modern-development-frameworks-guide-for-devs/
FPGA+
https://nandland.com/fpga-101/
These are the fundamental concepts that are important to understand when designing FPGAs.
NeurIPS+
https://neurips.cc/
CVPR+
https://cvpr.thecvf.com/
ICLR+
https://iclr.cc/
相关职位
社招BKY1
1、面向数据中心网络、存储、安全等相关业务的FPGA加速系统设计和实现; 2、协同软件工程师分析业务需求,进行FPGA方案选型和设计; 3、负责FPGA逻辑设计、仿真和调试; 4、负责FPGA加速产品上线后的自动化运维; 5、协助板级硬件工程师设计、开发和调试FPGA板卡。
更新于 2020-11-04
社招J9Q71
1、面向数据中心网络、存储、安全等相关业务的FPGA加速系统设计和实现; 2、协同软件工程师分析业务需求,进行FPGA方案选型和设计; 3、负责FPGA逻辑设计、仿真和调试; 4、负责FPGA加速产品上线后的自动化运维; 5、协助板级硬件工程师设计、开发和调试FPGA板卡。
更新于 2020-10-16
社招A50573
1、面向数据中心网络、存储、安全等相关业务的FPGA加速系统设计和实现; 2、协同软件工程师分析业务需求,进行FPGA方案选型和设计; 3、负责FPGA逻辑设计、仿真和调试; 4、负责FPGA加速产品上线后的自动化运维; 5、协助板级硬件工程师设计、开发和调试FPGA板卡。
更新于 2023-08-08