快手【留用实习】异构计算平台优化工程师
实习兼职J1020地点:上海 | 北京状态:招聘
任职要求
1、本科及以上学历,深入理解处理器体系结构(X86/ARM)或者常见GPGPU/NPU系统架构,了解CPU/GPU微架构、PMU等相关子领域; 2、对AI领域的基本理论与常见模型算法有深刻理解,熟练使用tensorflow或pytorch进行模型训练或tensorrt/tvm做推理优化,对使用GPU做AI算法加速有相关经历,熟悉GPU CUDA编程; 3、深入理解操作系统架构和实现原理,熟练掌握问题定位手段(perf、SystemTap、eBPF),精通软硬件系统性能分析及优化; 4、熟悉Linux kernel、虚拟化系统(KVM/QEMU/VirtIO)、内存管理、进程管理、I/O软件栈; 5、熟悉数据中心常见平台软件维护开发,例如:K8s,Hadoop/Spark,分布式存储(CephFS/HDFS)或存储引擎(InnoDB/RocksDB); 6、具备较强的逻辑思考能力、沟通能力、学习能力、合作精神,积极主动,有责任心,抗压性强。 加分项: 1、有GPU/NPU上的AI编译器/算子加速库/集合通信库开发经验; 2、有CPU/GPU模拟器/C-module开发经验; 3、熟悉新型硬件,有智能板卡、控制器固件、驱动、Open Channel、SPDK、DPDK等研发和应用经验。
工作职责
1、负责依据不同业务场景的特点和新硬件特性,结合系统软硬件栈的整体调优,提出并实施性能优化方案; 2、负责持续跟踪业内软硬件相关领域的技术发展趋势,结合不同业务场景未来需求,开展方案预研以及推广应用工作。 具体包括以下两种场景或者两种之一: 1)AI计算相关场景,例如:大模型训练场景,AIGC、NLP、推荐等常规推理场景; 2)以容器云、大数据计算平台为例的通用计算平台场景。
包括英文材料
学历+
算法+
https://roadmap.sh/datastructures-and-algorithms
Step by step guide to learn Data Structures and Algorithms in 2025
https://www.hellointerview.com/learn/code
A visual guide to the most important patterns and approaches for the coding interview.
https://www.w3schools.com/dsa/
TensorFlow+
https://www.youtube.com/watch?v=tpCFfeUEGs8
Ready to learn the fundamentals of TensorFlow and deep learning with Python? Well, you’ve come to the right place.
https://www.youtube.com/watch?v=ZUKz4125WNI
This part continues right where part one left off so get that Google Colab window open and get ready to write plenty more TensorFlow code.
PyTorch+
https://datawhalechina.github.io/thorough-pytorch/
PyTorch是利用深度学习进行数据科学研究的重要工具,在灵活性、可读性和性能上都具备相当的优势,近年来已成为学术界实现深度学习算法最常用的框架。
https://www.youtube.com/watch?v=V_xro1bcAuA
Learn PyTorch for deep learning in this comprehensive course for beginners. PyTorch is a machine learning framework written in Python.
TensorRT+
https://docs.nvidia.com/deeplearning/tensorrt/latest/getting-started/quick-start-guide.html
This TensorRT Quick Start Guide is a starting point for developers who want to try out the TensorRT SDK; specifically, it demonstrates how to quickly construct an application to run inference on a TensorRT engine.
CUDA+
https://developer.nvidia.com/blog/even-easier-introduction-cuda/
This post is a super simple introduction to CUDA, the popular parallel computing platform and programming model from NVIDIA.
https://www.youtube.com/watch?v=86FAWCzIe_4
Lean how to program with Nvidia CUDA and leverage GPUs for high-performance computing and deep learning.
Perf+
https://perfwiki.github.io/main/
perf is powerful: it can instrument CPU performance counters, tracepoints, kprobes, and uprobes (dynamic tracing).
https://www.brendangregg.com/bpf-performance-tools-book.html
This book can help you get the most out of your systems and applications, helping you improve performance, reduce costs, and solve software issues.
[英文] perf Examples
https://www.brendangregg.com/perf.html
These are some examples of using the perf Linux profiler, which has also been called Performance Counters for Linux (PCL), Linux perf events (LPE), or perf_events.
https://www.youtube.com/watch?v=M6ldFtwWup0
eBPF+
https://ebpf.io/get-started/
eBPF is a revolutionary technology that can run sandboxed programs in the Linux kernel without changing kernel source code or loading a kernel module.
Linux+
https://ryanstutorials.net/linuxtutorial/
Ok, so you want to learn how to use the Bash command line interface (terminal) on Unix/Linux.
https://ubuntu.com/tutorials/command-line-for-beginners
The Linux command line is a text interface to your computer.
https://www.youtube.com/watch?v=6WatcfENsOU
In this Linux crash course, you will learn the fundamental skills and tools you need to become a proficient Linux system administrator.
https://www.youtube.com/watch?v=v392lEyM29A
Never fear the command line again, make it fear you.
https://www.youtube.com/watch?v=ZtqBQ68cfJc
内核+
https://www.youtube.com/watch?v=C43VxGZ_ugU
I rummage around the Linux kernel source and try to understand what makes computers do what they do.
https://www.youtube.com/watch?v=HNIg3TXfdX8&list=PLrGN1Qi7t67V-9uXzj4VSQCffntfvn42v
Learn how to develop your very own kernel from scratch in this programming series!
https://www.youtube.com/watch?v=JDfo2Lc7iLU
Denshi goes over a simple explanation of what computer kernels are and how they work, alonside what makes the Linux kernel any special.
Kubernetes+
https://kubernetes.io/docs/tutorials/kubernetes-basics/
This tutorial provides a walkthrough of the basics of the Kubernetes cluster orchestration system.
https://kubernetes.io/zh-cn/docs/tutorials/kubernetes-basics/
本教程介绍 Kubernetes 集群编排系统的基础知识。每个模块包含关于 Kubernetes 主要特性和概念的一些背景信息,还包括一个在线教程供你学习。
https://www.youtube.com/watch?v=s_o8dwzRlu4
Hands-On Kubernetes Tutorial | Learn Kubernetes in 1 Hour - Kubernetes Course for Beginners
https://www.youtube.com/watch?v=X48VuDVv0do
Full Kubernetes Tutorial | Kubernetes Course | Hands-on course with a lot of demos
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
HDFS+
https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware.
https://www.ibm.com/cn-zh/think/topics/hdfs
Hadoop 分布式文件系统 (HDFS) 是一种管理大型数据集的文件系统,可在商用硬件上运行。
RocksDB+
https://rocksdb.org/docs/getting-started.html
The RocksDB library provides a persistent key value store.
C+
https://www.freecodecamp.org/chinese/news/the-c-beginners-handbook/
本手册遵循二八定律。你将在 20% 的时间内学习 80% 的 C 编程语言。
https://www.youtube.com/watch?v=87SH2Cn0s9A
https://www.youtube.com/watch?v=KJgsSFOSQv0
This course will give you a full introduction into all of the core concepts in the C programming language.
https://www.youtube.com/watch?v=PaPN51Mm5qQ
In this complete C programming course, Dr. Charles Severance (aka Dr. Chuck) will help you understand computer architecture and low-level programming with the help of the classic C Programming language book written by Brian Kernighan and Dennis Ritchie.
相关职位
实习J1020
1.参与AI与GPU相关项目的性能优化与研发,通过利用CPU、GPU的并行计算优化、架构优化、量化优化和异构调度等高性能优化技术,研发行业领先的高性能异构AI优化技术与编译优化技术; 2.针对搜索、推荐、广告、音视频以及大模型场景,优化模型训练和推理场景的性能; 3.与公司各算法部门深度合作,对重点项目进行算法与系统的联合优化。
更新于 2025-03-31
实习J1014
1、负责容器云平台的一个或多个领域的设计与开发; 2、基于Kubernetes完善统一调度、多集群联邦能力,提升集群运维效率; 3、基于但不限于servicemesh技术栈,实现微服务架构业务&离线计算任务的流量管控、链路追踪等基础能力; 4、负责公司混合计算平台及相关技术的设计与开发,提升异构资源管理效率; 5、结合容器领域前沿技术,负责容器云全局技术优化与落地实践。
更新于 2025-05-20
实习J1020
1. 负责分布式大语言模型 (LLM) 推理系统的底层基础设施研究与探索,包括 GPU 和 RDMA 等,提升 GPU 环境下的稳定性和计算效率; 2. 负责大规模模型训练场景优化工作,通过建设全面的异常发现、故障自愈机制,提升平台训练 MFU,降低训练成本; 3. 基于容器以及 Kubernetes 技术,负责对机器学习领域中的资源调度、模型训练、模型推理、数据管理等多个子方向的成本效率优化工作; 4. 持续关注并跟进业界技术发展,比如超长上下文、思维链、多模态方向;
更新于 2025-03-31