贝壳全模态算法工程师(Agent方向)【实习】(J71299)
实习兼职算法类地点:北京状态:招聘
任职要求
1、硕士及以上学历,毕业时间:2026年9月-2027年8月,计算机、人工智能、软件工程等相关专业; 2、扎实的自然语言处理与深度学习理论基础,熟悉Transformer等主流大模型架构及训练机制,理解预训练与对齐范式; 3、具备大语言模型微调与对齐优化实战经验,包括 SFT、RLHF、DPO 等方法,有完整训练、调参与效果优化经验者优先; 4、熟悉LLM Agent相关核心技术,包括但不限于: * Prompt En…
登录查看完整任职要求
微信扫码,1秒登录
工作职责
1、参与贝壳全模态智能体核心能力建设,围绕大语言模型(LLM)开展增量预训练、指令微调及对齐优化(SFT / RLHF / DPO / RLAIF 等),提升模型在复杂业务场景中的推理、规划与决策能力; 2、设计与优化基于LLM的Agent系统,包括任务拆解、链式推理、多步决策、工具调用(Tool Use / Function Calling)、RAG增强、长短期记忆机制等核心能力建设; 3、构建面向垂直业务场景的智能体解决方案,提升模型在多轮对话、流程执行、结构化信息理解与生成等复杂任务中的稳定性、可控性与泛化能力; 4、推动文本为核心的全模态能力融合,探索文本与视觉、语音等模态的统一建模与对齐机制,提升跨模态理解与交互能力; 5、搭建全模态智能体的数据与评测体系,包括指令数据构建、偏好数据生成、自动化benchmark设计及效果评估,支撑模型持续迭代优化; 6、与工程、产品团队紧密协作,推动全模态智能体系统在真实业务场景中的规模化落地与性能优化。
包括英文材料
学历+
NLP+
https://www.youtube.com/watch?v=fNxaJsNG3-s&list=PLQY2H8rRoyvzDbLUZkbudP-MFQZwNmU4S
Welcome to Zero to Hero for Natural Language Processing using TensorFlow!
https://www.youtube.com/watch?v=R-AG4-qZs1A&list=PLeo1K3hjS3uuvuAXhYjV2lMEShq2UYSwX
Natural Language Processing tutorial for beginners series in Python.
https://www.youtube.com/watch?v=rmVRLeJRkl4&list=PLoROMvodv4rMFqRtEuo6SGjY4XbRIVRd4
The foundations of the effective modern methods for deep learning applied to NLP.
深度学习+
https://d2l.ai/
Interactive deep learning book with code, math, and discussions.
Transformer+
https://huggingface.co/learn/llm-course/en/chapter1/4
Breaking down how Large Language Models work, visualizing how data flows through.
https://poloclub.github.io/transformer-explainer/
An interactive visualization tool showing you how transformer models work in large language models (LLM) like GPT.
https://www.youtube.com/watch?v=wjZofJX0v4M
Breaking down how Large Language Models work, visualizing how data flows through.
大模型+
https://www.youtube.com/watch?v=xZDB1naRUlk
You will build projects with LLMs that will enable you to create dynamic interfaces, interact with vast amounts of text data, and even empower LLMs with the capability to browse the internet for research papers.
https://www.youtube.com/watch?v=zjkBMFhNj_g
SFT+
https://cameronrwolfe.substack.com/p/understanding-and-using-supervised
Understanding how SFT works from the idea to a working implementation...
RLHF+
[英文] What is RLHF?
https://aws.amazon.com/what-is/reinforcement-learning-from-human-feedback/
Reinforcement learning from human feedback (RLHF) is a machine learning (ML) technique that uses human feedback to optimize ML models to self-learn more efficiently.
https://www.ibm.com/think/topics/rlhf
Reinforcement learning from human feedback (RLHF) is a machine learning technique in which a “reward model” is trained with direct human feedback, then used to optimize the performance of an artificial intelligence agent through reinforcement learning.
还有更多 •••