logo of amd

AMD模型优化工程师(推理&训练)Model Optimization Engineer (Inference & Training)

社招全职 Engineering地点:北京状态:招聘

任职要求


* Strong software engineering in Python and C/C++. * Practical experience with PyTorch/JAX and building/extending deep learning frameworks. * Hands‑on CUDA and/or ROCm development; experience writing or optimizing GPU kernels. * Experience with Triton (kernel development/optimization) is highly desired. * Proven experience with model optimization techniques, especially low‑bitwidth quantization and other compression methods. * Familiarity with GenAI inference engines and optimizations (e.g., vLLM, SGLang, xDiT, continuous batching, speculative decoding). * Skilled at profiling and performance debugging across stack layers (operator → model → framework → hardware). PREFERRED QUALIFICATIONS * Publications or contributions in model optimization / ML systems are a strong plus. * Experience with distributed traini…
登录查看完整任职要求
微信扫码,1秒登录

工作职责


THE ROLE We are looking for a hands‑on Engineer to design, implement, and optimize AI model training and inference solutions for AMD platforms. The role focuses on end‑to‑end performance and accuracy improvements at the framework, model, and operator levels, with strong emphasis on low‑bitwidth quantization, model compression, and real‑world deployment. You will work closely with AMD hardware and software teams, support customers, and contribute to open‑source projects and inference/training frameworks. KEY RESPONSIBILITIES * Design, implement, and optimize inference and training pipelines for AMD GPUs/accelerators at the framework, model, and operator levels. * Lead research and development of model optimization algorithms: low‑bitwidth quantization, pruning/sparsity, compression, efficient attention mechanisms, and lightweight architectures. * Implement and tune CUDA/ROCm/Triton kernels for critical operators; profile and eliminate performance bottlenecks. * Integrate and optimize models for PyTorch/JAX and common distributed training/inference stacks (Torchtitan, Megatron, DeepSpeed, HF Transformers, etc.). * Reduce latency and increase throughput for large‑model inference (e.g., batching strategies, caching, speculative decoding). * Contribute to and/or maintain open‑source inference/training tools, ensuring production readiness and community adoption. * Provide technical support and guidance to customers and internal teams to achieve target accuracy and performance on AMD platforms. TECHNICAL
包括英文材料
Python+
C+
C+++
PyTorch+
JAX+
开发框架+
还有更多 •••
相关职位

logo of baidu
社招MEG

-负责模型优化工程架构研发工作,涵盖预估架构、特征工程、模型训练、推理优化等。 -优化模型核心推理/训练性能,负责自研推理&训练框架的演进迭代 -优化在线的高并发高可用服务架构以及离线的高负载大数据量的服务架构 -和团队一起攻克高性能、高并发、高可用性等各种不同技术场景下的技术挑战

更新于 2024-08-12北京
logo of mi
社招A113845

1. 负责大语言模型线上推理框架的性能优化,解决高并发、低延迟、高可靠性等核心问题,提升服务吞吐量与稳定性 2. 设计并实现分布式大模型推理系统,优化多卡(如NVIDIA GPU集群)资源调度与通信效率,支持千卡级训练/推理场景 3. 深度适配NVIDIA GPU硬件架构,利用CUDA、cuDNN等工具链进行算子级优化,提升模型计算效率与显存利用率 4. 调研并引入前沿技术(如异构计算、AI编译器优化),推动模型量化、蒸馏等轻量化方案落地

更新于 2024-09-24北京
logo of xd
社招技术大类

1.负责 TapTap 离线训练、在线推理框架的优化与开发,服务于公司各个业务线,如搜索、推荐、广告、AI 等业务; 2.与公司各算法部门深度合作,分析业务性能瓶颈和系统架构特征,软硬件结合优化,实现极致性能; 3.设计和实现机器学习相关的基础设施/算法框架/工具链等,并推动落地到业务中; 4.探索业界前沿的机器学习相关技术,持续提升平台能力,降低算法使用成本。

更新于 2025-11-19上海
logo of bytedance
社招A52633

1、设计和研发业界领先的高性能端云算法引擎,提供满足语音识别,对话交互,语音合成,音频检索等场景的核心原子能力; 2、负责深度优化核心引擎,包括端云一体的高性能计算引擎,音频特征处理引擎,大规模解码引擎,音频合成引擎,音频特效引擎,对话交互引擎,音频检索引擎等常用引擎极致优化; 3、负责算法落地性能评估和分析,制定技术规划和性能标准,持续加强提升关键技术竞争力; 4、负责为字节跳动产品(今日头条、抖音、抖音火山版、西瓜视频、飞书、番茄小说等)提供AI语音理解、对话以及语音合成等方面的能力,用AI技术影响数亿用户。

更新于 2025-09-01深圳