米哈游LLM Evaluation算法研究员
校招全职程序&技术类地点:上海 | 北京状态:招聘
任职要求
1. 计算机、AI或相关领域硕士及以上学历,具备扎实的机器学习理论基础 2. 熟悉主流 LLM 评测框架及其局限性,有构建私有评测集的经验 3. 对 RLHF、DPO、PPO 等对齐算法有深入理解,熟悉 Reward Model 的训练与评估难点 4. 具备极强的数据敏感度,能从统计数据中发现模型能力的微弱变化 加分项 1. 在 NeurI…
登录查看完整任职要求
微信扫码,1秒登录
工作职责
1. 复杂能力评测: 针对逻辑推理、代码生成、长文本理解等高难度能力,设计并构建自动化评测集与评估管线 2. 主观偏好建模: 深入研究 RLHF 中的 Reward Model 表现,分析 Reward Hacking 现象;建立细粒度的评估准则,提升模型在开放式生成任务中的对齐效果 3. Model-based Evaluation: 研发并优化 LLM-as-a-Judge 技术,通过训练专用的 Critic Model 来替代人工进行大规模、高一致性的自动评估 4. 数据驱动迭代: 建立从评测结果到训练数据的反馈闭环,通过Bad Case分析指导 SFT 数据配比与 Post-training 策略调整
包括英文材料
学历+
机器学习+
https://www.youtube.com/watch?v=0oyDqO8PjIg
Learn about machine learning and AI with this comprehensive 11-hour course from @LunarTech_ai.
https://www.youtube.com/watch?v=i_LwzRVP7bg
Learn Machine Learning in a way that is accessible to absolute beginners.
https://www.youtube.com/watch?v=NWONeJKn6kc
Learn the theory and practical application of machine learning concepts in this comprehensive course for beginners.
https://www.youtube.com/watch?v=PcbuKRNtCUc
Learn about all the most important concepts and terms related to machine learning and AI.
大模型+
https://www.youtube.com/watch?v=xZDB1naRUlk
You will build projects with LLMs that will enable you to create dynamic interfaces, interact with vast amounts of text data, and even empower LLMs with the capability to browse the internet for research papers.
https://www.youtube.com/watch?v=zjkBMFhNj_g
RLHF+
[英文] What is RLHF?
https://aws.amazon.com/what-is/reinforcement-learning-from-human-feedback/
Reinforcement learning from human feedback (RLHF) is a machine learning (ML) technique that uses human feedback to optimize ML models to self-learn more efficiently.
https://www.ibm.com/think/topics/rlhf
Reinforcement learning from human feedback (RLHF) is a machine learning technique in which a “reward model” is trained with direct human feedback, then used to optimize the performance of an artificial intelligence agent through reinforcement learning.
算法+
https://roadmap.sh/datastructures-and-algorithms
Step by step guide to learn Data Structures and Algorithms in 2025
https://www.hellointerview.com/learn/code
A visual guide to the most important patterns and approaches for the coding interview.
https://www.w3schools.com/dsa/
NeurIPS+
https://neurips.cc/
还有更多 •••