腾讯腾讯广告-算法工程师-强化学习方向
社招全职3年以上腾讯广告技术地点:深圳状态:招聘
任职要求
1.计算机/统计学/运筹学硕士及以上学历,1-3年强化学习实战经验; 2.扎实的强化学习理论基础,掌握MDP、贝尔曼方程等核心理论框架,深入理解DQN、PPO、DDPG等算法原理,具备改进算法效率和稳定性能力。同时有传统机器学习和深度学习知识背景,熟悉Transformer/Attention等原理和应用; 3.…
登录查看完整任职要求
微信扫码,1秒登录
工作职责
1.多目标强化学习算法开发与调优。基于业务场景构建DQN、PPO、SAC等算法的改进框架,针对延迟奖励稀疏性设计分层强化学习架构。搭建离线仿真环境与在线AB测试闭环,设计动态滑动窗口评估机制,量化算法迭代效果; 2.效果瓶颈分析与突破。构建强化学习可解释性分析工具(如SHAP值、注意力热力图),定位状态表征缺失/奖励函数偏差/探索不足等瓶颈。设计课程学习机制,通过渐进式难度提升策略解决稀疏奖励场景下的策略退化问题; 3.状态与奖励机制创新。构建异构特征融合模型,集成用户实时行为序列(LSTM)、跨场景偏好迁移(Meta Learning)等高阶状态表征。设计复合奖励函数,融合稠密奖励(点击行为)与稀疏奖励(购买行为),引入基于KL散度的奖励塑形技术; 4.跟踪深度学习、计算广告、推荐系统,deepseek等最新前沿技术,应用到多目标排序。
包括英文材料
学历+
强化学习+
https://cloud.google.com/discover/what-is-reinforcement-learning?hl=en
Reinforcement learning (RL) is a type of machine learning where an "agent" learns optimal behavior through interaction with its environment.
https://huggingface.co/learn/deep-rl-course/unit0/introduction
This course will teach you about Deep Reinforcement Learning from beginner to expert. It’s completely free and open-source!
https://www.kaggle.com/learn/intro-to-game-ai-and-reinforcement-learning
Build your own video game bots, using classic and cutting-edge algorithms.
算法+
https://roadmap.sh/datastructures-and-algorithms
Step by step guide to learn Data Structures and Algorithms in 2025
https://www.hellointerview.com/learn/code
A visual guide to the most important patterns and approaches for the coding interview.
https://www.w3schools.com/dsa/
机器学习+
https://www.youtube.com/watch?v=0oyDqO8PjIg
Learn about machine learning and AI with this comprehensive 11-hour course from @LunarTech_ai.
https://www.youtube.com/watch?v=i_LwzRVP7bg
Learn Machine Learning in a way that is accessible to absolute beginners.
https://www.youtube.com/watch?v=NWONeJKn6kc
Learn the theory and practical application of machine learning concepts in this comprehensive course for beginners.
https://www.youtube.com/watch?v=PcbuKRNtCUc
Learn about all the most important concepts and terms related to machine learning and AI.
深度学习+
https://d2l.ai/
Interactive deep learning book with code, math, and discussions.
Transformer+
https://huggingface.co/learn/llm-course/en/chapter1/4
Breaking down how Large Language Models work, visualizing how data flows through.
https://poloclub.github.io/transformer-explainer/
An interactive visualization tool showing you how transformer models work in large language models (LLM) like GPT.
https://www.youtube.com/watch?v=wjZofJX0v4M
Breaking down how Large Language Models work, visualizing how data flows through.
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
还有更多 •••
相关职位
校招智能信息秋季20
1. 开展大语言模型、多模态生成/理解大模型复杂推理能力、可信性研究和奖励模型,包括但不限于自然语言处理、视觉理解、多模态生成与理解等; 2. 开展多模态大模型后训练阶段的强化学习算法研发迭代,包括:基于人类、AI、环境反馈的强化学习算法的优化升级,覆盖规则遵循、复杂推理等多个任务的多目标强化学习训练算法研发和调优,设计并实施实验; 3. 关注和学习最新前沿研究,参与学术讨论和技术交流,撰写研究报告、技术文档或论文,鼓励在国际顶级期刊或会议上发表研究成果。
更新于 2025-08-13北京|杭州
社招网易游戏(互娱)
- 参与强化学习、模仿学习、进化算法的落地工作,包括但不限于智能体、平衡性测试等; - 基于强化学习、模仿学习等AI技术为游戏产品打造更强力、更多样、更拟人的AI机器人; - 参与开发强化学习训练和部署平台。
更新于 2025-06-05广州