千问千问事业部-大模型压缩&推理加速高级专家-杭州/北京/广州
社招全职3年以上地点:北京 | 杭州 | 广州状态:招聘
任职要求
1. 在量化、剪枝、稀疏、蒸馏、投机解码、KV Cache压缩、Token压缩等至少一个方向有深入研究或大规模工程落地经验; 2. 熟悉主流低比特量化方法,包括FP8、FP4、INT8、INT4、PTQ、QAT、SmoothQuant、AWQ、GPTQ、KV Cache量化、混合精度策略等,能够分析并解决低比特部署中的精度退化问题…
登录查看完整任职要求
微信扫码,1秒登录
工作职责
1. 参与大模型压缩与推理加速技术研发,围绕语言模型、多模态模型、MoE模型和Agentic推理场景,设计并实现低成本、低延迟、高吞吐的模型优化方案; 2. 研发并落地低比特量化技术,包括FP8、FP4、INT8、INT4、KV Cache量化、MoE量化、混合精度量化、QAT/PTQ等方向,解决大规模模型低比特部署下的精度、稳定性和性能问题; 3. 参与投机解码、稀疏化、剪枝、蒸馏、Token剪枝、Prompt压缩、CoT压缩、KV Cache压缩等推理加速技术探索与落地,持续提升Decode效率并降低显存与计算开销; 4. 建立和完善模型压缩效果评估与回归机制,协同模型、算子、框架和业务团队完成算法方案到线上部署的闭环,对模型效果、推理成本和吞吐收益负责。
包括英文材料
缓存+
https://hackernoon.com/the-system-design-cheat-sheet-cache
The cache is a layer that stores a subset of data, typically the most frequently accessed or essential information, in a location quicker to access than its primary storage location.
https://www.youtube.com/watch?v=bP4BeUjNkXc
Caching strategies, Distributed Caching, Eviction Policies, Write-Through Cache and Least Recently Used (LRU) cache are all important terms when it comes to designing an efficient system with a caching layer.
https://www.youtube.com/watch?v=dGAgxozNWFE
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
PyTorch+
https://datawhalechina.github.io/thorough-pytorch/
PyTorch是利用深度学习进行数据科学研究的重要工具,在灵活性、可读性和性能上都具备相当的优势,近年来已成为学术界实现深度学习算法最常用的框架。
https://www.youtube.com/watch?v=V_xro1bcAuA
Learn PyTorch for deep learning in this comprehensive course for beginners. PyTorch is a machine learning framework written in Python.
大模型+
https://www.youtube.com/watch?v=xZDB1naRUlk
You will build projects with LLMs that will enable you to create dynamic interfaces, interact with vast amounts of text data, and even empower LLMs with the capability to browse the internet for research papers.
https://www.youtube.com/watch?v=zjkBMFhNj_g
还有更多 •••