腾讯混元大模型推理加速工程师-北京/深圳
社招全职5年以上TEG公共技术地点:北京状态:招聘
任职要求
1.熟练掌握 C/C++、Python语言,有计算机体系结构背景或软件开发背景,熟悉系统性能调优的方式; 2.具备基础的GPU编程能力,包括但不限于Cuda、OpenCL;熟悉至少一种GPU加速库,如cublas、cudnn、cutlass等; 3.有Tensorrt/FasterTransformer/Tensorrt-llm/vllm等深度学习推理框架的实际使用经验; 4.熟悉各类深度学习网络和算子底层实现细节,训练和推理模型调试、调优有实操经验优先; 5.熟悉CPU/GPU异构加速瓶颈分析方法,有服务器端 AI 芯片、GPU加速经验优先; 6.熟悉分布式推理常用加速方法,有超大模型分布式部署经验优先。
工作职责
1.配合算法工程师,推动深度学习相关算法的落地,打造高吞吐、低延时的推理系统; 2.优化大模型推理性能,提升吞吐并控制成本; 3.优化大模型推理框架,提升框架易用性和可调试性。
包括英文材料
C+
https://www.freecodecamp.org/chinese/news/the-c-beginners-handbook/
本手册遵循二八定律。你将在 20% 的时间内学习 80% 的 C 编程语言。
https://www.youtube.com/watch?v=87SH2Cn0s9A
https://www.youtube.com/watch?v=KJgsSFOSQv0
This course will give you a full introduction into all of the core concepts in the C programming language.
https://www.youtube.com/watch?v=PaPN51Mm5qQ
In this complete C programming course, Dr. Charles Severance (aka Dr. Chuck) will help you understand computer architecture and low-level programming with the help of the classic C Programming language book written by Brian Kernighan and Dennis Ritchie.
C+++
https://www.learncpp.com/
LearnCpp.com is a free website devoted to teaching you how to program in modern C++.
https://www.youtube.com/watch?v=ZzaPdXTrSb8
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
性能调优+
https://goperf.dev/
The Go App Optimization Guide is a series of in-depth, technical articles for developers who want to get more performance out of their Go code without relying on guesswork or cargo cult patterns.
https://web.dev/learn/performance
This course is designed for those new to web performance, a vital aspect of the user experience.
https://www.ibm.com/think/insights/application-performance-optimization
Application performance is not just a simple concern for most organizations; it’s a critical factor in their business’s success.
https://www.oreilly.com/library/view/optimizing-java/9781492039259/
Performance tuning is an experimental science, but that doesn’t mean engineers should resort to guesswork and folklore to get the job done.
CUDA+
https://developer.nvidia.com/blog/even-easier-introduction-cuda/
This post is a super simple introduction to CUDA, the popular parallel computing platform and programming model from NVIDIA.
https://www.youtube.com/watch?v=86FAWCzIe_4
Lean how to program with Nvidia CUDA and leverage GPUs for high-performance computing and deep learning.
OpenCL+
https://developer.nvidia.com/opencl
OpenCL™ (Open Computing Language) is a low-level API for heterogeneous computing that runs on CUDA-powered GPUs.
https://engineering.purdue.edu/~smidkiff/ece563/NVidiaGPUTeachingToolkit/Mod20OpenCL/3rd-Edition-AppendixA-intro-to-OpenCL.pdf
we will give a brief overview of OpenCL for CUDA programers.
[英文] Hands On OpenCL
https://handsonopencl.github.io/
An open source two-day lecture course for teaching and learning OpenCL
https://leonardoaraujosantos.gitbook.io/opencl/chapter1
Open Computing Language is a framework for writing programs that execute across heterogeneous platforms.
https://ulhpc-tutorials.readthedocs.io/en/latest/gpu/opencl/
OpenCL came as a standard for heterogeneous programming that enables a code to run in different platforms.
https://www.youtube.com/watch?v=4q9fPOI-x80
This presentation will show how to make use of the GPU from Java using OpenCL.
大模型+
https://www.youtube.com/watch?v=xZDB1naRUlk
You will build projects with LLMs that will enable you to create dynamic interfaces, interact with vast amounts of text data, and even empower LLMs with the capability to browse the internet for research papers.
https://www.youtube.com/watch?v=zjkBMFhNj_g
vLLM+
https://www.newline.co/@zaoyang/ultimate-guide-to-vllm--aad8b65d
vLLM is a framework designed to make large language models faster, more efficient, and better suited for production environments.
https://www.youtube.com/watch?v=Ju2FrqIrdx0
vLLM is a cutting-edge serving engine designed for large language models (LLMs), offering unparalleled performance and efficiency for AI-driven applications.
深度学习+
https://d2l.ai/
Interactive deep learning book with code, math, and discussions.
TensorRT+
https://docs.nvidia.com/deeplearning/tensorrt/latest/getting-started/quick-start-guide.html
This TensorRT Quick Start Guide is a starting point for developers who want to try out the TensorRT SDK; specifically, it demonstrates how to quickly construct an application to run inference on a TensorRT engine.
相关职位
社招1年以上公共技术
1.配合算法工程师,推动深度学习相关算法的落地,打造高吞吐、低延时的推理系统; 2.优化大模型推理性能,提升吞吐并控制成本; 3.优化大模型推理框架,提升框架易用性和可调试性。
更新于 2025-10-20
社招TEG技术
1.参与LLM、多模态大模型压缩加速方案研究,包括投机采样、稀疏化、量化和蒸馏等方法; 2.设计可落地的大模型压缩算法及成本优化方案,助力大模型的性能加速; 3.分析业务性能瓶颈和模型特点,定制化开发大模型压缩优化工具,实现高速推理方案。
更新于 2025-05-26
社招腾讯云(TEG)
1.从事多模态生成大模型基础模型算法训练和优化,包括图像生成、视频生成、多模态迭代生成、多模态编辑等; 2.从事基础大模型的组建研发,如Diffusion Models、Autoregressive Models等; 3.从事大模型数据科学的设计与实现,大模型训练和推理加速,确保基础模型的竞争力领先和顺利落地。
更新于 2025-06-16