腾讯大模型推理优化研发工程师-算子优化
社招全职2年以上CSIG技术地点:上海状态:招聘
任职要求
1.经验要求:2年以上GPU高性能计算开发经验,有大规模AI模型优化经验者优先; 2.精通CUDA OpenCL等GPU编程语言,熟悉NVIDIA或AMD的底层优化技巧; 3.精通Triton,Cutlass,CK等高性能算子开发工具; 4.熟悉VLLM、SGLang等大模型推理框架,有实际性能调优经验(如KV Cache优化、动态批处理、Attention算子定制等); 5.扎实的高性能计算基础,熟悉并行计算、内存优化、通信优化等技术; 6.熟练使用C/C++、Python,具备良好的算法设计与代码实现能力; 7.熟悉Attention结构MHA/MQA/GQA/MLA,以及MOE结构。 加分项 1.熟悉DeepSeek系列模型的工程优化技术,包括但不限于PD分离、MLA、MOE负载均衡、MTP等有TensorRT、VLLM、SGLang、Triton Inference Server、MLIR/LLVM等编译优化工具链的实际项目经验; 2.在同等条件下,通过腾讯云认证或取得同等资格认证的候选人,我们会优先考虑。
工作职责
1.参与基于GPU的高性能计算(HPC)项目设计与开发,负责GPU芯片(NVIDIA,AMD等)的底层性能优化与调优; 2.针对大模型推理场景,优化和扩展vLLM、SGLang等框架的核心模块,提升计算效率与资源利用率; 3.深入分析GPU硬件架构特性(如Tensor Core、显存带宽、通信机制等),设计并实现高性能算子与算法; 4.探索前沿技术方向(如混合专家模型MOE、动态计算图编译优化、JIT等),推动AI工程化落地的效率提升。
包括英文材料
CUDA+
https://developer.nvidia.com/blog/even-easier-introduction-cuda/
This post is a super simple introduction to CUDA, the popular parallel computing platform and programming model from NVIDIA.
https://www.youtube.com/watch?v=86FAWCzIe_4
Lean how to program with Nvidia CUDA and leverage GPUs for high-performance computing and deep learning.
OpenCL+
https://developer.nvidia.com/opencl
OpenCL™ (Open Computing Language) is a low-level API for heterogeneous computing that runs on CUDA-powered GPUs.
https://engineering.purdue.edu/~smidkiff/ece563/NVidiaGPUTeachingToolkit/Mod20OpenCL/3rd-Edition-AppendixA-intro-to-OpenCL.pdf
we will give a brief overview of OpenCL for CUDA programers.
[英文] Hands On OpenCL
https://handsonopencl.github.io/
An open source two-day lecture course for teaching and learning OpenCL
https://leonardoaraujosantos.gitbook.io/opencl/chapter1
Open Computing Language is a framework for writing programs that execute across heterogeneous platforms.
https://ulhpc-tutorials.readthedocs.io/en/latest/gpu/opencl/
OpenCL came as a standard for heterogeneous programming that enables a code to run in different platforms.
https://www.youtube.com/watch?v=4q9fPOI-x80
This presentation will show how to make use of the GPU from Java using OpenCL.
大模型+
https://www.youtube.com/watch?v=xZDB1naRUlk
You will build projects with LLMs that will enable you to create dynamic interfaces, interact with vast amounts of text data, and even empower LLMs with the capability to browse the internet for research papers.
https://www.youtube.com/watch?v=zjkBMFhNj_g
性能调优+
https://goperf.dev/
The Go App Optimization Guide is a series of in-depth, technical articles for developers who want to get more performance out of their Go code without relying on guesswork or cargo cult patterns.
https://web.dev/learn/performance
This course is designed for those new to web performance, a vital aspect of the user experience.
https://www.ibm.com/think/insights/application-performance-optimization
Application performance is not just a simple concern for most organizations; it’s a critical factor in their business’s success.
https://www.oreilly.com/library/view/optimizing-java/9781492039259/
Performance tuning is an experimental science, but that doesn’t mean engineers should resort to guesswork and folklore to get the job done.
C+
https://www.freecodecamp.org/chinese/news/the-c-beginners-handbook/
本手册遵循二八定律。你将在 20% 的时间内学习 80% 的 C 编程语言。
https://www.youtube.com/watch?v=87SH2Cn0s9A
https://www.youtube.com/watch?v=KJgsSFOSQv0
This course will give you a full introduction into all of the core concepts in the C programming language.
https://www.youtube.com/watch?v=PaPN51Mm5qQ
In this complete C programming course, Dr. Charles Severance (aka Dr. Chuck) will help you understand computer architecture and low-level programming with the help of the classic C Programming language book written by Brian Kernighan and Dennis Ritchie.
C+++
https://www.learncpp.com/
LearnCpp.com is a free website devoted to teaching you how to program in modern C++.
https://www.youtube.com/watch?v=ZzaPdXTrSb8
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
算法+
https://roadmap.sh/datastructures-and-algorithms
Step by step guide to learn Data Structures and Algorithms in 2025
https://www.hellointerview.com/learn/code
A visual guide to the most important patterns and approaches for the coding interview.
https://www.w3schools.com/dsa/
TensorRT+
https://docs.nvidia.com/deeplearning/tensorrt/latest/getting-started/quick-start-guide.html
This TensorRT Quick Start Guide is a starting point for developers who want to try out the TensorRT SDK; specifically, it demonstrates how to quickly construct an application to run inference on a TensorRT engine.
LLVM+
https://llvm.org/docs/GettingStarted.html
Welcome to the LLVM project!
https://llvm.org/docs/tutorial/
This is the “Kaleidoscope” Language tutorial, showing how to implement a simple language using LLVM components in C++.
https://mcyoung.xyz/2023/08/01/llvm-ir/
“LLVM” is an umbrella name for a number of software components that can be used to build compilers.
https://www.youtube.com/watch?v=Lvc8qx8ukOI
This is the first lecture from the "Programming Language with LLVM" course where we build a full programming language similar to JavaScript from scratch, using LLVM compiler infrastructure.
相关职位
实习机器学习平台
1、研发面向大语言/多模态/CV/NLP等类型模型的推理与训练框架; 2、参与推理框架研发优化,包括算子优化、推理架构优化、异构调度等多种技术研发落地等; 3、参与训练框架研发优化,包括数据读取、分布式训练及微调工具链等AI基础设施的建设等; 4、参与多个业务场景中的模型压缩技术实现,对模型进行轻量化压缩,提高训练/推理效率,支持业务降本增效; 5、与公司各算法部门深度合作,参与大语言模型、多模态大模型、计算机视觉、语音、自然语言处理等业务训推任务的优化提效; 6、深度参与周边深度学习系统多个子方向的工作,包括但不限于模型管理、推理部署、日志/监控、工作流编排等。
校招J1014
1、参与快手大规模深度学习推理引擎、大模型训练解决方案的研发与优化,包括大模型推理、模型训练框架、微调平台等; 2、参与底层算子的优化、通过优化访存pattern、计算提升推理性能,与算法部门合作,为公司大模型定制训练方案,探索RLHF、MoE、多模态、longcontext等前沿方向,提升训练性能; 3、优化推理框架上层调度策略,通过机内、机间的计算任务调度和通讯优化提升引擎性能;优化现有大语言模型相关工具和平台,提高模型训练、维护效率,降低成本,提升训练服务稳定性。
更新于 2025-07-30