阿里巴巴集团安全部-强化学习/Agent算法工程师/专家-行为风控方向
社招全职3年以上地点:北京状态:招聘
任职要求
1、硕士研究生及以上学历,计算机、人工智能、软件、信息安全、统计和数学专业优先; 2、3年以上大模型/强化学习相关研发经验,深刻理解RLHF/Agent训练经验; 3、具备业务风控领域(作弊、欺诈、账号安全、恶意行为等方向)的实战经验,对风险数据(日志、行为序列、用户画像、图数据)有敏锐的洞察…
登录查看完整任职要求
微信扫码,1秒登录
工作职责
岗位面向行为风控这一高度复杂且动态对抗的业务场景,支持反爬、作弊、欺诈、账号安全、恶意行为等核心风控业务,聚焦大模型强化学习与Agent等核心技术,构建下一代智能风控基座和行业权威的行为域基模解决方案。 1、面向行为域的结构化/序列化/图表化数据体系,构建面向结构化数据的后训练和评测方案,产出行为特色的“世界模型”; 2、围绕行为风控的复杂任务,设计并迭代强化学习方案,包括但不限于:Reward System、RL、复杂决策、自我博弈等方向,构建全链路情报分析与风险决策能力; 3、面向行为分析、识别、挖掘、链路还原、路径推演等场景,设计可规模化扩展的Agent训练环境和迭代方案;
包括英文材料
学历+
大模型+
https://www.youtube.com/watch?v=xZDB1naRUlk
You will build projects with LLMs that will enable you to create dynamic interfaces, interact with vast amounts of text data, and even empower LLMs with the capability to browse the internet for research papers.
https://www.youtube.com/watch?v=zjkBMFhNj_g
强化学习+
https://cloud.google.com/discover/what-is-reinforcement-learning?hl=en
Reinforcement learning (RL) is a type of machine learning where an "agent" learns optimal behavior through interaction with its environment.
https://huggingface.co/learn/deep-rl-course/unit0/introduction
This course will teach you about Deep Reinforcement Learning from beginner to expert. It’s completely free and open-source!
https://www.kaggle.com/learn/intro-to-game-ai-and-reinforcement-learning
Build your own video game bots, using classic and cutting-edge algorithms.
RLHF+
[英文] What is RLHF?
https://aws.amazon.com/what-is/reinforcement-learning-from-human-feedback/
Reinforcement learning from human feedback (RLHF) is a machine learning (ML) technique that uses human feedback to optimize ML models to self-learn more efficiently.
https://www.ibm.com/think/topics/rlhf
Reinforcement learning from human feedback (RLHF) is a machine learning technique in which a “reward model” is trained with direct human feedback, then used to optimize the performance of an artificial intelligence agent through reinforcement learning.
还有更多 •••