阿里巴巴阿里国际站-AI Agent训练数据工程师-杭州
社招全职3年以上技术类-数据地点:杭州状态:招聘
任职要求
1. 3年以上数据工程 / 机器学习数据 / LLM 数据相关经验。 2. 熟练使用 Python,熟悉 Spark / Flink / MaxCompute / Ray 等至少一种。 3. 了解 SFT / Preference / RLHF / RLAIF / Eval 等数据构造流程。 4. 有复杂…
登录查看完整任职要求
微信扫码,1秒登录
工作职责
我们在做面向真实跨境贸易场景的 AI Agent,覆盖买家采购、商家经营、Research、商品发布、跨平台运营、物流跟踪等复杂任务。
岗位核心不是传统数仓/ETL,而是构建 Agent 训练数据、轨迹数据、评测数据与数据闭环系统。
你会做什么
1. 搭建 Agent / LLM 数据生产 Pipeline:采集、清洗、切分、去重、结构化、版本管理。
2. 构建 SFT、Preference、Tool Use、Function Call、多步规划、任务轨迹等训练数据。
3. 从 Query、对话、商品、行为日志、操作轨迹中挖掘高价值样本。
4. 建设 Agent Benchmark / Eval 数据集,覆盖任务完成率、工具调用准确率、多语言质量等。
5. 建立数据质检、抽检、异常发现、标注协同与自动化评测机制。
6. 和算法/产品/工程一起做“数据-训练-评测-线上反馈”闭环。包括英文材料
机器学习+
https://www.youtube.com/watch?v=0oyDqO8PjIg
Learn about machine learning and AI with this comprehensive 11-hour course from @LunarTech_ai.
https://www.youtube.com/watch?v=i_LwzRVP7bg
Learn Machine Learning in a way that is accessible to absolute beginners.
https://www.youtube.com/watch?v=NWONeJKn6kc
Learn the theory and practical application of machine learning concepts in this comprehensive course for beginners.
https://www.youtube.com/watch?v=PcbuKRNtCUc
Learn about all the most important concepts and terms related to machine learning and AI.
大模型+
https://www.youtube.com/watch?v=xZDB1naRUlk
You will build projects with LLMs that will enable you to create dynamic interfaces, interact with vast amounts of text data, and even empower LLMs with the capability to browse the internet for research papers.
https://www.youtube.com/watch?v=zjkBMFhNj_g
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
还有更多 •••