快手【留用实习】AI性能优化工程师
实习兼职J1020地点:杭州 | 北京状态:招聘
任职要求
1.本科及以上学历,计算机相关专业; 2.在图优化、量化、算子优化等技术方向其中一项有深入研究; 3.熟练掌握C++编程语言,具备扎实的数据结构与算法能力,熟悉计算机体系结构和X86汇编,熟悉Python编程; 4.熟悉XLA、MLIR、TVM、Triton、TensorRT等技术之一,并有相应开发经验者优先; 5.熟悉CPU(ARM/x86)或GPU(Intel/Nvidia/AMD)平台的高性能计算优化技术,对计算机体系结构有深入理解,熟悉并行计算优化、访存优化和低比特计算等; 6.了解深度学习算法基本原理,熟悉神经网络基本架构及其算子计算方式,了解至少一种深度学习训练框架及其模型文件解析,如Pytorch、TensorFlow; 7.有使用GPU做AI算法加速相关经历,熟悉GPU CUDA编程; 8.具有独立解决问题的能力,能够对业务逻辑进行合理的抽象和拆分,具备良好的团队合作精神; 9.了解主流AIGC算法模型原理,具有AIGC模型加速优化经验者优先。
工作职责
1.参与AI与GPU相关项目的性能优化与研发,通过利用CPU、GPU的并行计算优化、架构优化、量化优化和异构调度等高性能优化技术,研发行业领先的高性能异构AI优化技术与编译优化技术; 2.针对搜索、推荐、广告、音视频以及大模型场景,优化模型训练和推理场景的性能; 3.与公司各算法部门深度合作,对重点项目进行算法与系统的联合优化。
包括英文材料
学历+
C+++
https://www.learncpp.com/
LearnCpp.com is a free website devoted to teaching you how to program in modern C++.
https://www.youtube.com/watch?v=ZzaPdXTrSb8
数据结构+
https://www.youtube.com/watch?v=8hly31xKli0
In this course you will learn about algorithms and data structures, two of the fundamental topics in computer science.
https://www.youtube.com/watch?v=B31LgI4Y4DQ
Learn about data structures in this comprehensive course. We will be implementing these data structures in C or C++.
https://www.youtube.com/watch?v=CBYHwZcbD-s
Data Structures and Algorithms full course tutorial java
算法+
https://roadmap.sh/datastructures-and-algorithms
Step by step guide to learn Data Structures and Algorithms in 2025
https://www.hellointerview.com/learn/code
A visual guide to the most important patterns and approaches for the coding interview.
https://www.w3schools.com/dsa/
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
TensorRT+
https://docs.nvidia.com/deeplearning/tensorrt/latest/getting-started/quick-start-guide.html
This TensorRT Quick Start Guide is a starting point for developers who want to try out the TensorRT SDK; specifically, it demonstrates how to quickly construct an application to run inference on a TensorRT engine.
深度学习+
https://d2l.ai/
Interactive deep learning book with code, math, and discussions.
PyTorch+
https://datawhalechina.github.io/thorough-pytorch/
PyTorch是利用深度学习进行数据科学研究的重要工具,在灵活性、可读性和性能上都具备相当的优势,近年来已成为学术界实现深度学习算法最常用的框架。
https://www.youtube.com/watch?v=V_xro1bcAuA
Learn PyTorch for deep learning in this comprehensive course for beginners. PyTorch is a machine learning framework written in Python.
TensorFlow+
https://www.youtube.com/watch?v=tpCFfeUEGs8
Ready to learn the fundamentals of TensorFlow and deep learning with Python? Well, you’ve come to the right place.
https://www.youtube.com/watch?v=ZUKz4125WNI
This part continues right where part one left off so get that Google Colab window open and get ready to write plenty more TensorFlow code.
CUDA+
https://developer.nvidia.com/blog/even-easier-introduction-cuda/
This post is a super simple introduction to CUDA, the popular parallel computing platform and programming model from NVIDIA.
https://www.youtube.com/watch?v=86FAWCzIe_4
Lean how to program with Nvidia CUDA and leverage GPUs for high-performance computing and deep learning.
相关职位
实习J1020
1、负责依据不同业务场景的特点和新硬件特性,结合系统软硬件栈的整体调优,提出并实施性能优化方案; 2、负责持续跟踪业内软硬件相关领域的技术发展趋势,结合不同业务场景未来需求,开展方案预研以及推广应用工作。 具体包括以下两种场景或者两种之一: 1)AI计算相关场景,例如:大模型训练场景,AIGC、NLP、推荐等常规推理场景; 2)以容器云、大数据计算平台为例的通用计算平台场景。
更新于 2025-03-11
社招J1020
1、参与大模型推理/训练优化。通过研发业界领先的AI Compiler 技术,支撑搜推场景在GPU上的训练计算性能优化;支持大模型推理优化技术在异构硬件上的落地; 2、参与各种大模型推理所需的功能性开发任务;相关编译优化功能开发,以图优化、算子融合、GPU高性能算子开发及自动Codegen等技术手段不断推高在不同卡型上的计算性能极限; 3、参与支持日常的大模型推理服务部署,参与内部日常提效工具的研发。
更新于 2025-05-26
实习J1020
1、负责依据不同业务场景的特点和新硬件特性,结合系统软硬件栈的整体调优,提出并实施性能优化方案; 2、负责持续跟踪业内软硬件相关领域的技术发展趋势,结合不同业务场景未来需求,开展方案预研以及推广应用工作; 具体包括以下三种场景或者三种之一: 1)以容器云、大数据计算平台为例的通用计算平台场景; 2)AI计算相关场景,例如:大模型训练场景,AIGC、NLP、推荐等常规推理场景; 3)结构化以及非结构化数据存储场景。
更新于 2025-03-04