AMD模型优化工程师(推理&训练)Model Optimization Engineer (Inference & Training)
社招全职 Engineering地点:北京状态:招聘
任职要求
* Strong software engineering in Python and C/C++. * Practical experience with PyTorch/JAX and building/extending deep learning frameworks. * Hands‑on CUDA and/or ROCm development; experience writing or optimizing GPU kernels. * Experience with Triton (kernel development/optimization) is highly desired. * Proven experience with model optimization techniques, especially low‑bitwidth quantization and other compression methods. * Familiarity with GenAI inference engines and optimizations (e.g., vLLM, SGLang, xDiT, continuous batching, speculative decoding). * Skilled at profiling and performance debugging across stack layers (operator → model → framework → hardware). PREFERRED QUALIFICATIONS * Publications or contributions in model optimization / ML systems are a strong plus. * Experience with distributed traini…
登录查看完整任职要求
微信扫码,1秒登录
工作职责
THE ROLE We are looking for a hands‑on Engineer to design, implement, and optimize AI model training and inference solutions for AMD platforms. The role focuses on end‑to‑end performance and accuracy improvements at the framework, model, and operator levels, with strong emphasis on low‑bitwidth quantization, model compression, and real‑world deployment. You will work closely with AMD hardware and software teams, support customers, and contribute to open‑source projects and inference/training frameworks. KEY RESPONSIBILITIES * Design, implement, and optimize inference and training pipelines for AMD GPUs/accelerators at the framework, model, and operator levels. * Lead research and development of model optimization algorithms: low‑bitwidth quantization, pruning/sparsity, compression, efficient attention mechanisms, and lightweight architectures. * Implement and tune CUDA/ROCm/Triton kernels for critical operators; profile and eliminate performance bottlenecks. * Integrate and optimize models for PyTorch/JAX and common distributed training/inference stacks (Torchtitan, Megatron, DeepSpeed, HF Transformers, etc.). * Reduce latency and increase throughput for large‑model inference (e.g., batching strategies, caching, speculative decoding). * Contribute to and/or maintain open‑source inference/training tools, ensuring production readiness and community adoption. * Provide technical support and guidance to customers and internal teams to achieve target accuracy and performance on AMD platforms. TECHNICAL
包括英文材料
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
C+
https://www.freecodecamp.org/chinese/news/the-c-beginners-handbook/
本手册遵循二八定律。你将在 20% 的时间内学习 80% 的 C 编程语言。
https://www.youtube.com/watch?v=87SH2Cn0s9A
https://www.youtube.com/watch?v=KJgsSFOSQv0
This course will give you a full introduction into all of the core concepts in the C programming language.
https://www.youtube.com/watch?v=PaPN51Mm5qQ
In this complete C programming course, Dr. Charles Severance (aka Dr. Chuck) will help you understand computer architecture and low-level programming with the help of the classic C Programming language book written by Brian Kernighan and Dennis Ritchie.
C+++
https://www.learncpp.com/
LearnCpp.com is a free website devoted to teaching you how to program in modern C++.
https://www.youtube.com/watch?v=ZzaPdXTrSb8
PyTorch+
https://datawhalechina.github.io/thorough-pytorch/
PyTorch是利用深度学习进行数据科学研究的重要工具,在灵活性、可读性和性能上都具备相当的优势,近年来已成为学术界实现深度学习算法最常用的框架。
https://www.youtube.com/watch?v=V_xro1bcAuA
Learn PyTorch for deep learning in this comprehensive course for beginners. PyTorch is a machine learning framework written in Python.
JAX+
https://docs.jax.dev/en/latest/notebooks/thinking_in_jax.html
JAX is a library for array-oriented numerical computation, with automatic differentiation and JIT compilation to enable high-performance machine learning research.
开发框架+
[英文] Understanding Modern Development Frameworks: A Guide for Developers and Technical Decision-makers
https://www.freecodecamp.org/news/understanding-modern-development-frameworks-guide-for-devs/
还有更多 •••
相关职位
社招MEG
-负责模型优化工程架构研发工作,涵盖预估架构、特征工程、模型训练、推理优化等。 -优化模型核心推理/训练性能,负责自研推理&训练框架的演进迭代 -优化在线的高并发高可用服务架构以及离线的高负载大数据量的服务架构 -和团队一起攻克高性能、高并发、高可用性等各种不同技术场景下的技术挑战
更新于 2024-08-12北京
社招A113845
1. 负责大语言模型线上推理框架的性能优化,解决高并发、低延迟、高可靠性等核心问题,提升服务吞吐量与稳定性 2. 设计并实现分布式大模型推理系统,优化多卡(如NVIDIA GPU集群)资源调度与通信效率,支持千卡级训练/推理场景 3. 深度适配NVIDIA GPU硬件架构,利用CUDA、cuDNN等工具链进行算子级优化,提升模型计算效率与显存利用率 4. 调研并引入前沿技术(如异构计算、AI编译器优化),推动模型量化、蒸馏等轻量化方案落地
更新于 2024-09-24北京
社招技术大类
1.负责 TapTap 离线训练、在线推理框架的优化与开发,服务于公司各个业务线,如搜索、推荐、广告、AI 等业务; 2.与公司各算法部门深度合作,分析业务性能瓶颈和系统架构特征,软硬件结合优化,实现极致性能; 3.设计和实现机器学习相关的基础设施/算法框架/工具链等,并推动落地到业务中; 4.探索业界前沿的机器学习相关技术,持续提升平台能力,降低算法使用成本。
更新于 2025-11-19上海