携程AI Infra 研发工程师(GPU 推理方向)(MJ034962)
社招全职1年以上住宿业务AI & BI地点:上海状态:招聘
任职要求
计算机及相关专业本科及以上学历,1 年以上 GPU / 高性能计算 / 深度学习推理相关研发经验。熟悉 CUDA 编程基础,了解 GPU 体系架构(SM、Warp、Memory Hierarchy、Tensor Core),具备编写和优化 CUDA kernel 的实践经验。熟悉至少一种主流深度学习推理框架(TensorRT / ONNX Runtime / TVM / Triton Inference Server),了解图优化、算子融合、量化等基本原理。了解主流推荐模型(DLRM、DIN、生成式推荐等)或 Transformer 类模型的结构与推理特点,对模型性能瓶颈有一定认识。熟练使用 c++/python/java等至少一种语言,熟悉 Linux 开发环境,代码…
登录查看完整任职要求
微信扫码,1秒登录
工作职责
参与推荐系统 GPU 推理引擎的研发工作,支撑生成式推荐、排序、召回等业务场景的在线推理服务落地。参与 CUDA 算子开发与优化,包括算子融合、量化(INT8/FP8)、Tensor Core 使用、显存与访存优化等方向,持续提升单卡吞吐与推理延迟表现。参与推理图优化工作,基于 TensorRT / ONNX Runtime / TVM / Triton 等主流框架完成模型的图变换、算子替换、kernel 调优,协助推动模型高效上线。针对推荐模型的特点(稀疏 Embedding、变长序列、多塔结构等),协助完成定制化推理方案的开发与调优,解决 Host-Device 传输、KV Cache 管理等性能瓶颈。参与性能 profiling 与调优工作,熟练使用 Nsight、CUPTI 等工具完成性能分析,配合算法团队完成模型结构的性能评估。关注 GPU 推理、LLM Serving、推荐系统 Infra 的业界前沿进展(vLLM、SGLang、FlashAttention 等),积极学习并参与新技术在团队内的落地。
包括英文材料
学历+
深度学习+
https://d2l.ai/
Interactive deep learning book with code, math, and discussions.
CUDA+
https://developer.nvidia.com/blog/even-easier-introduction-cuda/
This post is a super simple introduction to CUDA, the popular parallel computing platform and programming model from NVIDIA.
https://www.youtube.com/watch?v=86FAWCzIe_4
Lean how to program with Nvidia CUDA and leverage GPUs for high-performance computing and deep learning.
内核+
https://www.youtube.com/watch?v=C43VxGZ_ugU
I rummage around the Linux kernel source and try to understand what makes computers do what they do.
https://www.youtube.com/watch?v=HNIg3TXfdX8&list=PLrGN1Qi7t67V-9uXzj4VSQCffntfvn42v
Learn how to develop your very own kernel from scratch in this programming series!
https://www.youtube.com/watch?v=JDfo2Lc7iLU
Denshi goes over a simple explanation of what computer kernels are and how they work, alonside what makes the Linux kernel any special.
TensorRT+
https://docs.nvidia.com/deeplearning/tensorrt/latest/getting-started/quick-start-guide.html
This TensorRT Quick Start Guide is a starting point for developers who want to try out the TensorRT SDK; specifically, it demonstrates how to quickly construct an application to run inference on a TensorRT engine.
ONNX+
https://github.com/onnx/tutorials
Open Neural Network Exchange (ONNX) is an open standard format for representing machine learning models.
[英文] Introduction to ONNX
https://onnx.ai/onnx/intro/
This documentation describes the ONNX concepts (Open Neural Network Exchange).
Triton Inference Server+
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
Triton Inference Server is an open source inference serving software that streamlines AI inferencing.
Transformer+
https://huggingface.co/learn/llm-course/en/chapter1/4
Breaking down how Large Language Models work, visualizing how data flows through.
https://poloclub.github.io/transformer-explainer/
An interactive visualization tool showing you how transformer models work in large language models (LLM) like GPT.
https://www.youtube.com/watch?v=wjZofJX0v4M
Breaking down how Large Language Models work, visualizing how data flows through.
C+++
https://www.learncpp.com/
LearnCpp.com is a free website devoted to teaching you how to program in modern C++.
https://www.youtube.com/watch?v=ZzaPdXTrSb8
还有更多 •••