
Momenta云端模型性能优化工程师
社招全职算法地点:北京 | 上海状态:招聘
任职要求
1. 熟练使用 C++/Python,操作系统原理,计算机基础知识掌握扎实 2. 熟悉计算机体系结构,对 GPU、CPU、NPU 等计算加速单元有一定的理解,有 CUDA、Neon、trition 等…
登录查看完整任职要求
微信扫码,1秒登录
工作职责
1. 负责将深度学习模型(如 CNN、Transformer 等)高效部署到车端或云端集群,优化推理时延、内存占用及功耗 2. 实现车端模型包括大模型的量化(INT8/fp8),结合 TensorRT、torch 等框架完成端到端性能调优 3. 开发或优化高性能算子,利用 CUDA、OpenCL、NEON 指令集或硬件加速库(cuDNN、OneDNN)实现极致性能
包括英文材料
C+++
https://www.learncpp.com/
LearnCpp.com is a free website devoted to teaching you how to program in modern C++.
https://www.youtube.com/watch?v=ZzaPdXTrSb8
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
CUDA+
https://developer.nvidia.com/blog/even-easier-introduction-cuda/
This post is a super simple introduction to CUDA, the popular parallel computing platform and programming model from NVIDIA.
https://www.youtube.com/watch?v=86FAWCzIe_4
Lean how to program with Nvidia CUDA and leverage GPUs for high-performance computing and deep learning.
TensorRT+
https://docs.nvidia.com/deeplearning/tensorrt/latest/getting-started/quick-start-guide.html
This TensorRT Quick Start Guide is a starting point for developers who want to try out the TensorRT SDK; specifically, it demonstrates how to quickly construct an application to run inference on a TensorRT engine.
vLLM+
https://www.newline.co/@zaoyang/ultimate-guide-to-vllm--aad8b65d
vLLM is a framework designed to make large language models faster, more efficient, and better suited for production environments.
https://www.youtube.com/watch?v=Ju2FrqIrdx0
vLLM is a cutting-edge serving engine designed for large language models (LLMs), offering unparalleled performance and efficiency for AI-driven applications.
还有更多 •••
相关职位
社招技术
* 基于 Vision-Language Models (VLM) 和 Large Language Models (LLM),设计与实现自动驾驶中行为预测与运动规划的基座模型(Foundation Model) * 利用多模态预训练大模型进行轨迹生成与融合,提升基座模型对其他交通参与者意图的理解与预测能力 * 针对车端/云端部署,开展模型算法层面性能优化工作,例如压缩、剪枝、蒸馏、训练和推理加速等,确保模型可用性、系统实时性与资源利用率 * 与算法、软件和系统团队紧密协作,推动模型集成及在仿真与真实车载平台的落地
更新于 2025-09-04杭州
实习
深度优化训练流程 主导模型训练全链路性能分析与优化,设计GPU资源弹性调度策略 开发自动化训练加速工具链,构建可扩展的云端训练框架 研发混合精度训练、梯度压缩等前沿技术,突破训练吞吐瓶颈 构建训练优化体系 制定标准化训练效能评估体系,建立成本-效率量化模型 设计可复用的训练加速组件库,沉淀最佳实践方法论 开发训练过程性能分析平台,实现性能问题智能诊断 赋能业务研发 优化多任务资源调度策略,提升GPU集群整体利用率 为算法团队提供训练加速解决方案,缩短模型迭代周期
更新于 2025-07-22北京
