滴滴2026未来精英-基于强化学习的Agent构建
校招全职L lab地点:北京状态:招聘
任职要求
Agent是当下大模型领域最热门的话题之一,传统的agent方案基于一些模块的划分或者流程的编排来完成复杂任务,然而O1类模型和DeepResearch类应用的出现让人们认识到基于强化学习的端到端训练可能是提升agent能力天花板的最优方式。- 如何设计强化学习框架,使得计算资源得以高效利用,将大参数量模型的强化学习任务稳定高效的运行起来。- 如何设计reward可以有效的提升模型能力,如何在开放领域构建rule-based reward,同时避免reward hacking。- 如何在开放领域设计高质量且高多样性的prompt池来执行强化学习任务。- 如何进行开放领域任务的有效评估每一个问题都具有极高的挑战性,但一旦能够很好的解决,将会产生非常大的价值。
工作职责
无
包括英文材料
AI agent+
https://www.ibm.com/think/ai-agents
Your one-stop resource for gaining in-depth knowledge and hands-on applications of AI agents.
大模型+
https://www.youtube.com/watch?v=xZDB1naRUlk
You will build projects with LLMs that will enable you to create dynamic interfaces, interact with vast amounts of text data, and even empower LLMs with the capability to browse the internet for research papers.
https://www.youtube.com/watch?v=zjkBMFhNj_g
强化学习+
https://cloud.google.com/discover/what-is-reinforcement-learning?hl=en
Reinforcement learning (RL) is a type of machine learning where an "agent" learns optimal behavior through interaction with its environment.
https://huggingface.co/learn/deep-rl-course/unit0/introduction
This course will teach you about Deep Reinforcement Learning from beginner to expert. It’s completely free and open-source!
https://www.kaggle.com/learn/intro-to-game-ai-and-reinforcement-learning
Build your own video game bots, using classic and cutting-edge algorithms.
Prompt+
https://cloud.google.com/vertex-ai/generative-ai/docs/learn/prompts/introduction-prompt-design
A prompt is a natural language request submitted to a language model to receive a response back.
https://learn.microsoft.com/en-us/azure/ai-foundry/openai/concepts/prompt-engineering
These techniques aren't recommended for reasoning models like gpt-5 and o-series models.
https://www.youtube.com/watch?v=LWiMwhDZ9as
Learn and master the fundamentals of Prompt Engineering and LLMs with this 5-HOUR Prompt Engineering Crash Course!
相关职位