
得物大模型平台数据架构师(风控方向)
社招全职5年以上风控类地点:上海 | 杭州状态:招聘
任职要求
1. 基础要求 - 本科及以上学历,计算机科学、数据科学、统计学、数学等相关专业,5年以上互联网大数据相关工作经验,具备电商或风控行业大数据架构设计经验者优先。 - 精通大数据技术栈,熟练掌握Hadoop、Spark、Flink、Hive、HBase等主流组件的原理与应用,能独立完成大规模数据平台的设计与搭建。 - 具备大模型相关数据支撑实践经验,熟悉大模型训练/微调的数据准备流程,了解向量数据库(Milvus/Weaviate等)、RAG技术及Agent框架(LangChain等)的应用逻辑。 - 熟练掌握Python/Java/Scala中至少一种编程语言,具备扎实的代码开发能力和工程化思维,能独立完成数据处理模块、数据接口的开发与优化。 …
登录查看完整任职要求
微信扫码,1秒登录
工作职责
- 负责大模型平台大数据层的架构设计与落地,涵盖数据采集、清洗、治理、存储、特征工程及数据服务全流程,构建适配风控场景的大模型数据支撑体系。 - 深入拆解风控核心业务场景(如交易欺诈识别、虚假交易拦截、违规商品检测、恶意差评防控等)的数据需求,结合业务同学自定义Agent的使用场景,设计针对性的数据解决方案。 - 主导搭建大模型Agent专用数据服务体系,包括但不限于RAG(检索增强生成)数据底座、向量数据库部署与优化、实时/离线数据算力调度,保障业务同学高效调用数据训练与使用Agent。 - 负责大数据平台与大模型平台的协同对接,解决数据流转中的兼容性、性能瓶颈及安全合规问题,构建稳定、高效的数据链路,支撑Agent全生命周期的数据需求。 - 牵头数据质量管控体系建设,制定数据标注、筛选及评估标准,通过数据增强、合成等技术提升数据质量,为大模型Agent的精准性提供核心保障。 - 跟踪大模型与大数据融合领域的前沿技术(如LangChain/LlamaIndex生态、多模态数据处理等),结合风控业务场景推动技术落地,持续优化数据架构与服务能力。 - 联动业务团队、大模型算法团队及工程团队,建立“业务需求-数据支撑-模型优化”的闭环机制,快速响应业务同学在Agent自定义过程中的数据痛点。 - 负责团队内部大数据技术沉淀与分享,主导数据相关规范制定,提升团队在大模型数据支撑领域的专业能力。
包括英文材料
学历+
数据科学+
https://roadmap.sh/ai-data-scientist
Step by step roadmap guide to becoming an AI and Data Scientist
大数据+
https://www.youtube.com/watch?v=bAyrObl7TYE
https://www.youtube.com/watch?v=H4bf_uuMC-g
With all this talk of Big Data, we got Rebecca Tickle to explain just what makes data into Big Data.
系统设计+
https://roadmap.sh/system-design
Everything you need to know about designing large scale systems.
https://www.youtube.com/watch?v=F2FmTdLtb_4
This complete system design tutorial covers scalability, reliability, data handling, and high-level architecture with clear explanations, real-world examples, and practical strategies.
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
HBase+
[英文] HBase Tutorial
https://www.tutorialspoint.com/hbase/index.htm
HBase is a data model that is similar to Google's big table designed to provide quick random access to huge amounts of structured data. This tutorial provides an introduction to HBase, the procedures to set up HBase on Hadoop File Systems, and ways to interact with HBase shell.
大模型+
https://www.youtube.com/watch?v=xZDB1naRUlk
You will build projects with LLMs that will enable you to create dynamic interfaces, interact with vast amounts of text data, and even empower LLMs with the capability to browse the internet for research papers.
https://www.youtube.com/watch?v=zjkBMFhNj_g
Milvus+
[英文] Tutorials Overview
https://milvus.io/docs/tutorials-overview.md
This page provides a list of tutorials for you to interact with Milvus.
https://www.baeldung.com/milvus-tutorial-intro
In this tutorial, we’ll explore Milvus, a highly scalable open-source vector database.
https://www.youtube.com/watch?v=7ejr_ZzU9jw
Discover the power of Milvus, an open-source vector database revolutionizing AI applications.
https://www.youtube.com/watch?v=Yhv19le0sBw
Vector databases have been trending recently as they power modern search, recommendations, and AI-driven applications.
RAG+
https://www.youtube.com/watch?v=sVcwVQRHIc8
Learn how to implement RAG (Retrieval Augmented Generation) from scratch, straight from a LangChain software engineer.
还有更多 •••
相关职位
社招A199302
1、负责火山引擎-方舟大模型平台的研发,研究大模型在千行百业应用落地的系统化解决方案,大幅降低大模型应用的IT成本,满足用户不断增长的智能交互需求,全面提升用户在未来世界的生活和交流方式; 2、覆盖机器学习系统多个子方向领域的工作,包括:资源调度、模型训练、模型推理、数据管理、工作流编排等。
更新于 2023-11-01杭州
社招A96161
1、负责火山引擎-方舟大模型平台的研发,研究大模型在千行百业应用落地的系统化解决方案,大幅降低大模型应用的IT成本,满足用户不断增长的智能交互需求,全面提升用户在未来世界的生活和交流方式; 2、覆盖机器学习系统多个子方向领域的工作,包括:资源调度、模型训练、模型推理、数据管理、工作流编排等。
更新于 2023-11-01北京
社招A189998
1、负责火山引擎-方舟大模型平台的研发,研究大模型在千行百业应用落地的系统化解决方案,大幅降低大模型应用的IT成本,满足用户不断增长的智能交互需求,全面提升用户在未来世界的生活和交流方式; 2、覆盖机器学习系统多个子方向领域的工作,包括:资源调度、模型训练、模型推理、数据管理、工作流编排等。
更新于 2023-11-01上海