钉钉钉钉-AI Infra架构师-大模型推理优化
社招全职5年以上技术类-开发地点:杭州状态:招聘
任职要求
1. 编程基础扎实: 熟悉 Python、C++或Go,具备优秀的工程能力和代码风格 2. 熟悉深度学习框架: 熟悉至少一种主流深度学习框架(PyTorch, TensorFlow等),了解其模型结构和训练流程。 3. 丰富的推理框架实践经验: 深入理解并具备丰富的主流推理框架实践经验,如 vLLM、SGLang、TensorRT, ONNX Runtime, TVM, OpenVINO 等,有实际项目落地经验者优先 4. 掌握模型优化方法: 熟悉常用的模型优化方法,包括但不限于量化(PTQ/QAT)、剪枝、知识蒸馏等,并了解其对模型精度和性能的影响 5. 具备硬件性能调优能力: 具备在至少一种硬件平台(GPU、昇腾、海光等)上的性能调优经验,熟悉相关硬件架构和编程模型(如CUDA)者优先 6. 优秀的综合素质: 具备优秀的分析和解决问题能力,良好的沟通能力和团队协作精神,有强烈的责任心和自驱力
工作职责
1. 推理引擎应用与优化: 负责AI模型的性能优化与部署,应用业界主流推理框架(如vLLM、SGLang、TensorRT、ONNX Runtime、 TVM、OpenVINO等)对模型进行加速 2. 模型性能深度分析与调优: 针对公司业务场景,对各类AI模型(包括但不限于大语言模型LLM、VL、ASR、TTS等)在不同硬件平台(GPU、PPU、昇腾、海光等)上进行深度性能分析、定位瓶颈并制定优化方案 3. 模型轻量化技术实践: 熟练运用模型量化(PTQ/QAT)、剪枝、蒸馏等技术,在满足业务精度要求的前提下,最大化提升模型推理速度、降低资源消耗 4. 自动化部署与MLOps流程建设: 建立并完善模型部署、监控和迭代的自动化流程(CI/CD、MLOps),确保模型稳定、高效地服务内外部客户 5. 前沿技术跟进与引入: 持续跟进行业前沿的AI推理加速技术,探索并引入新的优化方案、工具及硬件(如LLM推理优化库、新型AI芯片等),持续提升业务模型的部署效率和运行性能
包括英文材料
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
C+++
https://www.learncpp.com/
LearnCpp.com is a free website devoted to teaching you how to program in modern C++.
https://www.youtube.com/watch?v=ZzaPdXTrSb8
Go+
https://www.youtube.com/watch?v=8uiZC0l4Ajw
学习Golang的完整教程!从开始到结束不到一个小时,包括如何在Go中构建API的完整演示。没有多余的内容,只有你需要知道的知识。
深度学习+
https://d2l.ai/
Interactive deep learning book with code, math, and discussions.
PyTorch+
https://datawhalechina.github.io/thorough-pytorch/
PyTorch是利用深度学习进行数据科学研究的重要工具,在灵活性、可读性和性能上都具备相当的优势,近年来已成为学术界实现深度学习算法最常用的框架。
https://www.youtube.com/watch?v=V_xro1bcAuA
Learn PyTorch for deep learning in this comprehensive course for beginners. PyTorch is a machine learning framework written in Python.
TensorFlow+
https://www.youtube.com/watch?v=tpCFfeUEGs8
Ready to learn the fundamentals of TensorFlow and deep learning with Python? Well, you’ve come to the right place.
https://www.youtube.com/watch?v=ZUKz4125WNI
This part continues right where part one left off so get that Google Colab window open and get ready to write plenty more TensorFlow code.
vLLM+
https://www.newline.co/@zaoyang/ultimate-guide-to-vllm--aad8b65d
vLLM is a framework designed to make large language models faster, more efficient, and better suited for production environments.
https://www.youtube.com/watch?v=Ju2FrqIrdx0
vLLM is a cutting-edge serving engine designed for large language models (LLMs), offering unparalleled performance and efficiency for AI-driven applications.
SGLang+
[英文] Install SGLang
https://docs.sglang.ai/get_started/install.html
SGLang is a fast serving framework for large language models and vision language models.
https://github.com/sgl-project/sgl-learning-materials
TensorRT+
https://docs.nvidia.com/deeplearning/tensorrt/latest/getting-started/quick-start-guide.html
This TensorRT Quick Start Guide is a starting point for developers who want to try out the TensorRT SDK; specifically, it demonstrates how to quickly construct an application to run inference on a TensorRT engine.
ONNX+
https://github.com/onnx/tutorials
Open Neural Network Exchange (ONNX) is an open standard format for representing machine learning models.
[英文] Introduction to ONNX
https://onnx.ai/onnx/intro/
This documentation describes the ONNX concepts (Open Neural Network Exchange).
性能调优+
https://goperf.dev/
The Go App Optimization Guide is a series of in-depth, technical articles for developers who want to get more performance out of their Go code without relying on guesswork or cargo cult patterns.
https://web.dev/learn/performance
This course is designed for those new to web performance, a vital aspect of the user experience.
https://www.ibm.com/think/insights/application-performance-optimization
Application performance is not just a simple concern for most organizations; it’s a critical factor in their business’s success.
https://www.oreilly.com/library/view/optimizing-java/9781492039259/
Performance tuning is an experimental science, but that doesn’t mean engineers should resort to guesswork and folklore to get the job done.
CUDA+
https://developer.nvidia.com/blog/even-easier-introduction-cuda/
This post is a super simple introduction to CUDA, the popular parallel computing platform and programming model from NVIDIA.
https://www.youtube.com/watch?v=86FAWCzIe_4
Lean how to program with Nvidia CUDA and leverage GPUs for high-performance computing and deep learning.
相关职位
社招5年以上技术类-算法
1.post-training 框架研发,聚焦 LLM + RL 方向,设计框架架构与技术路线,提升其扩展性、稳定性与效率。 2.优化框架性能,如训练速度、显存占用等,降低训练成本,为 LLM + RL 训练提供有力技术支撑。 3.协同业务团队,将 LLM 能力在业务场景落地,根据业务需求定制训练方案并评估验证模型。 4.负责模型训练和推理所需的IaaS基础设施的规划、迭代与框架维护,致力于提供一致性、可扩展、高可靠的平台技术底座;
更新于 2025-08-18
社招A103504B
1、负责混合云云原生AI Infra技术架构设计、推理场景系统优化、云原生AI套件等开发工作。 2、负责研发AI异构计算软件栈,通过结合不同硬件、高性能网络、缓存等技术,实现AI计算的全链路优化,助力打造高可靠、高性能、高效率的AI算力基础设施; 3、洞察人工智能及深度学习的发展趋势,积极参与下一代AI基础设施的设计与研发。
更新于 2025-02-26
社招A90640
1、负责混合云云原生AI Infra技术架构设计、推理场景系统优化、云原生AI套件等开发工作; 2、负责研发AI异构计算软件栈,通过结合不同硬件、高性能网络、缓存等技术,实现AI计算的全链路优化,助力打造高可靠、高性能、高效率的AI算力基础设施; 3、洞察人工智能及深度学习的发展趋势,积极参与下一代AI基础设施的设计与研发。
更新于 2025-02-26