美团【基座大模型北斗实习】高性能计算与大模型系统优化
实习兼职核心本地商业-基础研发平台地点:北京 | 上海状态:招聘
任职要求
1、GPU编程:写过CUDA kernel,理解warp/SM/显存层次结构; 2、训练框架:用过或改过Megatron/DeepSpeed/FSDP,不只是跑过demo; 3、通信系统:了解NCCL原理,或…
登录查看完整任职要求
微信扫码,1秒登录
工作职责
简介:参与面向大模型训练/推理的高性能计算与系统优化工作,可根据个人背景和研究兴趣选择以下方向之一深入推进: 1、面向GPU/NPU 等多硬件平台的算子开发与极致性能优化(CUDA/Cutlass/TileLang)。 2、跟踪前沿模型算法并完成高性能工程落地,深入框架层开展算子融合、内存/通信优化、流水编排等系统级性能调优。 3、探索基于大模型的高性能 Kernel 自动合成技术,参考 LLM-driven Kernel Generation 范式,研究高效 GPU/NPU Kernel 的端到端自动生成与迭代优化方法。 4、基于DSL(如 Triton、TVM TIR、Halide 等)的编译优化与自动调优技术研究。 5、多硬件后端(NVIDIA/国产芯片)的统一算子库建设与迁移适配。
包括英文材料
CUDA+
https://developer.nvidia.com/blog/even-easier-introduction-cuda/
This post is a super simple introduction to CUDA, the popular parallel computing platform and programming model from NVIDIA.
https://www.youtube.com/watch?v=86FAWCzIe_4
Lean how to program with Nvidia CUDA and leverage GPUs for high-performance computing and deep learning.
内核+
https://www.youtube.com/watch?v=C43VxGZ_ugU
I rummage around the Linux kernel source and try to understand what makes computers do what they do.
https://www.youtube.com/watch?v=HNIg3TXfdX8&list=PLrGN1Qi7t67V-9uXzj4VSQCffntfvn42v
Learn how to develop your very own kernel from scratch in this programming series!
https://www.youtube.com/watch?v=JDfo2Lc7iLU
Denshi goes over a simple explanation of what computer kernels are and how they work, alonside what makes the Linux kernel any special.
Megatron+
https://www.youtube.com/watch?v=hc0u4avAkuM
还有更多 •••