美团【北斗】大模型算法研究员(Agent/RL/推理)
校招全职核心本地商业-业务研发平台地点:北京状态:招聘
任职要求
【任职资格】 必备条件: 1.2027届计算机、数学、统计等相关专业在读或应届,本科及以上,博士/硕士优先 2.扎实的机器学习与深度学习基础,熟悉Transformer架构及其变体,具备独立阅读和复现顶会论文的能力 3.熟练掌握Python及PyTorch/JAX等主流框架,具备清晰的代码工程意识 4.对大模型的训练流程(预训练/后训练)或Agent构建有系统性理解,具备独立完成端到端实验的能力 5.具备RLHF/DPO/GRPO或其他对齐算法的实际训练与调优,对相关数据构建有深度认知 加分项: 1.熟悉ClaudeCode、OpenClaw、Hermes等开源Harness的设计和实现 2.在NeurIPS/ICML/ICLR/ACL/EMNLP等顶会发表过论文(含在投),或有被广泛引用的开源项目 3.有Agent系统(如…
登录查看完整任职要求
微信扫码,1秒登录
工作职责
【愿景】 建成全球持续领先、客户长期信赖的履约技术平台,打造市场首选、社会认可、服务10亿用户的配送品牌。 【你将参与】 方向一:Agent技术体系研究与落地 1.设计并构建面向真实业务场景的Agent技术体系,涵盖任务规划、工具调用、多轮推理、自我反思与纠错等核心模块 2.深入抽象业务核心问题,建立可量化的评估反馈信号,驱动Agent在复杂履约场景中实现可度量的效果提升 3.探索Multi-Agent协作框架,研究Agent在千万级并发场景下的可靠性、安全性与成本效率 方向二:LLM后训练算法研究 1.负责面向特定业务场景的指令精调(SFT)、偏好对齐(RLHF/DPO/GRPO等)全链路研究与工程落地 2.研究ScalableOversight、持续学习、AI和环境反馈的强化学习(RLXF)等前沿方向。同时探索奖励模型与反馈机制、可泛化的细粒度过程监督和奖励建模等,提升模型在复杂推理与工具调用任务上的能力上限 3.主导训练数据的质量工程,包括数据清洗、合成数据构建及标注流程设计 4.垂域模型定制化构建,领域认知智能突破,探索小样本场景自演进架构设计、可信推理机制构建等方向 方向三:评测与数据体系建设 1.设计覆盖Agent行为、模型能力、业务指标的多维评测体系,建立自动化的诊断与归因链路 2.与业务团队深度协作,构建端到端的训练-评估-迭代闭环,将研究成果转化为线上可量化的业务收益 方向四:前沿跟踪与对外输出 1.持续追踪NeurIPS/ICML/ICLR/ACL等顶会最新进展,具备将前沿论文快速工程化落地的能力 2.鼓励将内部研究成果整理为学术论文,向行业输出技术影响力
包括英文材料
机器学习+
https://www.youtube.com/watch?v=0oyDqO8PjIg
Learn about machine learning and AI with this comprehensive 11-hour course from @LunarTech_ai.
https://www.youtube.com/watch?v=i_LwzRVP7bg
Learn Machine Learning in a way that is accessible to absolute beginners.
https://www.youtube.com/watch?v=NWONeJKn6kc
Learn the theory and practical application of machine learning concepts in this comprehensive course for beginners.
https://www.youtube.com/watch?v=PcbuKRNtCUc
Learn about all the most important concepts and terms related to machine learning and AI.
深度学习+
https://d2l.ai/
Interactive deep learning book with code, math, and discussions.
Transformer+
https://huggingface.co/learn/llm-course/en/chapter1/4
Breaking down how Large Language Models work, visualizing how data flows through.
https://poloclub.github.io/transformer-explainer/
An interactive visualization tool showing you how transformer models work in large language models (LLM) like GPT.
https://www.youtube.com/watch?v=wjZofJX0v4M
Breaking down how Large Language Models work, visualizing how data flows through.
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
PyTorch+
https://datawhalechina.github.io/thorough-pytorch/
PyTorch是利用深度学习进行数据科学研究的重要工具,在灵活性、可读性和性能上都具备相当的优势,近年来已成为学术界实现深度学习算法最常用的框架。
https://www.youtube.com/watch?v=V_xro1bcAuA
Learn PyTorch for deep learning in this comprehensive course for beginners. PyTorch is a machine learning framework written in Python.
JAX+
https://docs.jax.dev/en/latest/notebooks/thinking_in_jax.html
JAX is a library for array-oriented numerical computation, with automatic differentiation and JIT compilation to enable high-performance machine learning research.
大模型+
https://www.youtube.com/watch?v=xZDB1naRUlk
You will build projects with LLMs that will enable you to create dynamic interfaces, interact with vast amounts of text data, and even empower LLMs with the capability to browse the internet for research papers.
https://www.youtube.com/watch?v=zjkBMFhNj_g
AI agent+
https://www.ibm.com/think/ai-agents
Your one-stop resource for gaining in-depth knowledge and hands-on applications of AI agents.
RLHF+
[英文] What is RLHF?
https://aws.amazon.com/what-is/reinforcement-learning-from-human-feedback/
Reinforcement learning from human feedback (RLHF) is a machine learning (ML) technique that uses human feedback to optimize ML models to self-learn more efficiently.
https://www.ibm.com/think/topics/rlhf
Reinforcement learning from human feedback (RLHF) is a machine learning technique in which a “reward model” is trained with direct human feedback, then used to optimize the performance of an artificial intelligence agent through reinforcement learning.
还有更多 •••