快手AI 性能优化工程师(北京/杭州/深圳)
社招全职D7198地点:北京状态:招聘
任职要求
1.在图优化、量化、算子优化等技术方向其中一项有深入研究; 2.熟练掌握Python等编程语言,具备扎实的数据结构与算法能力; 3.熟悉vLLM、SGlang、Torch.compile、XLA、Triton、TensorRT、TensorRT-LLM等技术之一,并有相应开发经验者优先; 4.熟悉GPU(Nvidia/AMD)平台的高性能计算优化技术,对GPU硬件有深入理解,熟悉并行计算优化、访存优化和低比特计算等,熟悉Nsight System/ Nsight Compute 工具的使用及性能分析; 5.了解深度学习算法基本原理,熟悉神经网络基本架构及其算子计算方式,了解至少一种深度学习训练框架及其模型文件解析,如Pytorch、TensorFlow; 6.有使用GPU做AI算法加速相关经历,熟悉CUDA编程,具备较好开发能力,熟悉triton、cutlass、有算子库开发经验者优先; 7.熟悉LLM infer 相关技术栈,熟悉TP/PP/DP等分布式通信原理等优先; 8.具有独立解决问题的能力,能够对业务逻辑进行合理的抽象和拆分,具备良好的团队合作精神。
工作职责
1.参与AI与GPU相关项目的性能优化与研发,通过利用并行计算优化、架构优化、量化优化和异构调度等高性能优化技术,研发行业领先的高性能异构AI优化技术与编译优化技术; 2.针对搜广推、音视频以及大模型场景,优化大模型训练和推理场景的性能; 3.与公司各算法部门深度合作,对重点项目进行算法与系统的联合优化。
包括英文材料
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
数据结构+
https://www.youtube.com/watch?v=8hly31xKli0
In this course you will learn about algorithms and data structures, two of the fundamental topics in computer science.
https://www.youtube.com/watch?v=B31LgI4Y4DQ
Learn about data structures in this comprehensive course. We will be implementing these data structures in C or C++.
https://www.youtube.com/watch?v=CBYHwZcbD-s
Data Structures and Algorithms full course tutorial java
算法+
https://roadmap.sh/datastructures-and-algorithms
Step by step guide to learn Data Structures and Algorithms in 2025
https://www.hellointerview.com/learn/code
A visual guide to the most important patterns and approaches for the coding interview.
https://www.w3schools.com/dsa/
大模型+
https://www.youtube.com/watch?v=xZDB1naRUlk
You will build projects with LLMs that will enable you to create dynamic interfaces, interact with vast amounts of text data, and even empower LLMs with the capability to browse the internet for research papers.
https://www.youtube.com/watch?v=zjkBMFhNj_g
vLLM+
https://www.newline.co/@zaoyang/ultimate-guide-to-vllm--aad8b65d
vLLM is a framework designed to make large language models faster, more efficient, and better suited for production environments.
https://www.youtube.com/watch?v=Ju2FrqIrdx0
vLLM is a cutting-edge serving engine designed for large language models (LLMs), offering unparalleled performance and efficiency for AI-driven applications.
TensorRT+
https://docs.nvidia.com/deeplearning/tensorrt/latest/getting-started/quick-start-guide.html
This TensorRT Quick Start Guide is a starting point for developers who want to try out the TensorRT SDK; specifically, it demonstrates how to quickly construct an application to run inference on a TensorRT engine.
Nsight+
https://developer.nvidia.com/tools-tutorials
NVIDIA Nsight™ Developer tools are a suite of tools for building, profiling, and debugging accelerated applications.
https://www.youtube.com/watch?v=aQ1NYoRvp7o
Profile Python for AI and deep learning applications with NVIDIA's suite of Nsight Developer Tools.
https://www.youtube.com/watch?v=Iuy_RAvguBM
Join NVIDIA’s Jackson Marusarz for an introduction to NVIDIA Nsight Compute, a tool for in-depth analysis of CUDA kernel performance on GPUs.
深度学习+
https://d2l.ai/
Interactive deep learning book with code, math, and discussions.
PyTorch+
https://datawhalechina.github.io/thorough-pytorch/
PyTorch是利用深度学习进行数据科学研究的重要工具,在灵活性、可读性和性能上都具备相当的优势,近年来已成为学术界实现深度学习算法最常用的框架。
https://www.youtube.com/watch?v=V_xro1bcAuA
Learn PyTorch for deep learning in this comprehensive course for beginners. PyTorch is a machine learning framework written in Python.
TensorFlow+
https://www.youtube.com/watch?v=tpCFfeUEGs8
Ready to learn the fundamentals of TensorFlow and deep learning with Python? Well, you’ve come to the right place.
https://www.youtube.com/watch?v=ZUKz4125WNI
This part continues right where part one left off so get that Google Colab window open and get ready to write plenty more TensorFlow code.
CUDA+
https://developer.nvidia.com/blog/even-easier-introduction-cuda/
This post is a super simple introduction to CUDA, the popular parallel computing platform and programming model from NVIDIA.
https://www.youtube.com/watch?v=86FAWCzIe_4
Lean how to program with Nvidia CUDA and leverage GPUs for high-performance computing and deep learning.
相关职位
校招J1020
1.参与AI与GPU相关项目的性能优化与研发,通过利用CPU、GPU的并行计算优化、架构优化、量化优化和异构调度等高性能优化技术,研发行业领先的高性能异构AI优化技术与编译优化技术; 2.针对搜索、推荐、广告、音视频以及大模型场景,优化模型训练和推理场景的性能; 3.与公司各算法部门深度合作,对重点项目进行算法与系统的联合优化。
更新于 2025-07-30
社招2年以上A207604A
1、参与抖音研发效能平台能力的建设,负责系统设计和核心代码开发; 2、支撑抖音业务工程提效、红蓝攻防、风险治理等方向智能化建设; 3、对前端工程化体系建设、开发规范、组件化、测试有深入认识和实践; 4、对产品的稳定性和性能极致的追求,深入理解并致力于优化和重构,确保系统高效、稳定运作; 5、关注AI领域的最新动态和趋势,结合开发者的实际需求,为我们提供高性能、适应性强的技术解决方案。
更新于 2024-09-14
社招A171311A
1、设计和实现基于LLM的智能体架构,包括任务规划、对话管理、意图识别、流程工程等; 2、设计和实现多模态Agent,支持文本、语音、图像等多种输入/输出形式的处理; 3、推动AI Agent在架构和性能上的持续优化,提升Agent对用户理解能力和响应的准确性; 4、开发和维护智能体的后端服务,确保系统的稳定性和可扩展性; 5、跟踪并研究行业前沿AI算法,持续提升AI Agent的技术水平; 6、编写必要的技术文档,包括API接口说明、核心算法设计与代码开发。
更新于 2025-01-08