米哈游LLM Post-train 算法研究员
实习兼职程序&技术类地点:上海状态:招聘
任职要求
1)2027/2028 届在校硕士及以上学历,计算机科学、人工智能、机器学习、NLP 或相关专业 2)熟悉 Transformer / MoE 架构原理,熟练使用 PyTorch 及主流大模型训练/推理框架(如 DeepSpeed、Megatron-LM、VeRL、Slime、vLLM、SGLang 等) 3)具备扎实的强化学习基础,理解 PPO/DPO/GRPO 等算法原理,有将 RL 方法应用于语言模型对齐的实践经验 4)有 LLM 微调、对话系统训练或文本生成相关的研究或项目经验,了解分布式训练基础知识 5)具备较强的代码工程能力和实验设计能力,能够快速实现和验证算法思路 6)…
登录查看完整任职要求
微信扫码,1秒登录
工作职责
1)后训练算法研发:参与游戏内容、角色扮演等场景下大模型的后训练(Post-training)算法研发工作,涵盖 SFT、RLHF、DPO 等对齐方法的实现与优化,提升模型在剧情生成、角色一致性、对话连贯性、情感表达等维度的能力 2)奖励模型训练:参与 Reward Model 的设计与训练,探索面向对话质量、情感表达、角色一致性、安全性等维度的奖励信号构建,支撑强化学习训练流程 3)强化学习训练优化:参与基于 PPO/GRPO 等算法的大模型对齐训练,探索训练稳定性、采样效率和效果提升方法,支持模型在复杂多轮对话和开放域生成场景中的优化 4)数据工程:参与后训练阶段的数据构建工作,包括 SFT 数据设计、偏好数据采集与标注、数据清洗与质量评估,探索数据合成、数据增强与数据混合策略 5)多类型模型训练:参与辅助模型(如分类器、决策模型等)的训练与调优,支撑模型产品体系建设 6)实验与迭代:完成训练实验的设计与执行,分析实验结果,定位模型表现问题,提出改进方案并在时延要求内推动落地 7)前沿技术探索:跟踪 Post-training 领域最新研究进展(如 RLAIF、On-Policy Distillation、推理链压缩等),结合游戏对话业务需求进行技术预研与创新落地
包括英文材料
学历+
机器学习+
https://www.youtube.com/watch?v=0oyDqO8PjIg
Learn about machine learning and AI with this comprehensive 11-hour course from @LunarTech_ai.
https://www.youtube.com/watch?v=i_LwzRVP7bg
Learn Machine Learning in a way that is accessible to absolute beginners.
https://www.youtube.com/watch?v=NWONeJKn6kc
Learn the theory and practical application of machine learning concepts in this comprehensive course for beginners.
https://www.youtube.com/watch?v=PcbuKRNtCUc
Learn about all the most important concepts and terms related to machine learning and AI.
NLP+
https://www.youtube.com/watch?v=fNxaJsNG3-s&list=PLQY2H8rRoyvzDbLUZkbudP-MFQZwNmU4S
Welcome to Zero to Hero for Natural Language Processing using TensorFlow!
https://www.youtube.com/watch?v=R-AG4-qZs1A&list=PLeo1K3hjS3uuvuAXhYjV2lMEShq2UYSwX
Natural Language Processing tutorial for beginners series in Python.
https://www.youtube.com/watch?v=rmVRLeJRkl4&list=PLoROMvodv4rMFqRtEuo6SGjY4XbRIVRd4
The foundations of the effective modern methods for deep learning applied to NLP.
Transformer+
https://huggingface.co/learn/llm-course/en/chapter1/4
Breaking down how Large Language Models work, visualizing how data flows through.
https://poloclub.github.io/transformer-explainer/
An interactive visualization tool showing you how transformer models work in large language models (LLM) like GPT.
https://www.youtube.com/watch?v=wjZofJX0v4M
Breaking down how Large Language Models work, visualizing how data flows through.
PyTorch+
https://datawhalechina.github.io/thorough-pytorch/
PyTorch是利用深度学习进行数据科学研究的重要工具,在灵活性、可读性和性能上都具备相当的优势,近年来已成为学术界实现深度学习算法最常用的框架。
https://www.youtube.com/watch?v=V_xro1bcAuA
Learn PyTorch for deep learning in this comprehensive course for beginners. PyTorch is a machine learning framework written in Python.
大模型+
https://www.youtube.com/watch?v=xZDB1naRUlk
You will build projects with LLMs that will enable you to create dynamic interfaces, interact with vast amounts of text data, and even empower LLMs with the capability to browse the internet for research papers.
https://www.youtube.com/watch?v=zjkBMFhNj_g
vLLM+
https://www.newline.co/@zaoyang/ultimate-guide-to-vllm--aad8b65d
vLLM is a framework designed to make large language models faster, more efficient, and better suited for production environments.
https://www.youtube.com/watch?v=Ju2FrqIrdx0
vLLM is a cutting-edge serving engine designed for large language models (LLMs), offering unparalleled performance and efficiency for AI-driven applications.
还有更多 •••