安克创新机器人多模态大模型算法工程师(博士)
任职要求
岗位要求 1.硕士及以上学历,计算机科学/人工智能/机器人学相关专业 2.精通Transformer架构与大模型技术栈(微调/部署),掌握强化学习(PPO/SAC)或模仿学习(BC/GAIL)框架 3.熟练使用PyTorch/TensorFlow,精通Python/C…
工作职责
岗位职责 1.研发具身智能认知架构(VLM/VLA/VLN),实现多模态指令理解与长周期任务规划、自主导航系统 2.设计强化学习(RL)/模仿学习(IL)决策框架,解决开放场景稀疏奖励问题 3.优化模型结构、提升计算效率(模型剪枝/量化),解决端侧部署挑战 4.主导仿真(Isaac Gym/MuJoCo)到真机(人形机器人/机械臂)的Sim2Real迁移
1.推进机器人多模态大模型(VLM/VLA)、3D感知算法的工程化落地:涵盖预训练、微调、训练加速和效果调优。 2.基于issac sim搭建仿真环境验证操作模型,设计real2sim2real迁移框架,加速算法验证与落地。 3.具身智能算法研发,包括不同数据配比/网络结构/本体构型,在toC场景完成长序列任务和技能泛化。 4.研发自动化标注算法(2D/3D/VLA等),降低标注成本和提升标注质量。 5.设计多模态数据(图像、视频和点云等)生成算法,增强数据多样性。
1. 负责研究和开发适合机器人的多模态大模型算法,包括但不限于语言、图像、视频、点云等模态,应用于机器人环境感知、决策、规划控制等领域 2. 负责多模态大模型算法设计、开发以及验证,通过仿真和数据闭环等方式控制和量化算法迭代效果 3. 通过研发世界模型、生成式模型,搭建闭环渲染系统,辅助端到端模型的训练 4. 深入调研前沿算法,探索前沿算法在具体场景中落地的可能性
Team Introduction: Dedicated to building an industry-leading large-model dialogue system, the team serves hundreds of millions of daily active users, with application scenarios covering the entire Douyin e-commerce ecosystem. This includes core business scenarios such as platform customer service, platform merchant service, merchant customer service, influencer customer service, and innovative intelligent shopping guides. Through continuous technological innovation and optimization, the team has successfully established a complete intelligent dialogue solution, delivering significant efficiency improvements and user experience enhancements to e-commerce operations. Research Objectives: Develop an LLM-based customer service chatbot for TikTok and Douyin E-commerce, enabling intelligent customer service interactions. The LLM will handle the entire user inquiry process, including request clarification, solution negotiation, and execution. Necessity: LLM's strong conversational and reasoning abilities make it especially suitable for intelligent customer service, capable of potentially reaching the service standards of excellent human representatives. Research Content: Design a multi-agent framework based on LLM, integrating planning-agent, reply-agent, and tool-agent. Each agent will specialize in different functions, working collaboratively to manage the complete service process—from issue identification and solution negotiation to solution implementation and feedback. 1) Reply-agent ensures the proposed solutions comply with platform policies and service guidelines, avoids excessive improvisation or hallucinations, and maintains smooth communication and negotiation with the user. 2) Planning-agent identifies user demands and problem scenarios, sourcing relevant service guidelines and constraints as well as recognizing risk scenarios. 3) Tool-agent validates the legality of tool usage, accurately interprets the results from tool interactions, and manages execution dependencies of various actions. Research Challenges: Compliance with service guidelines: Ensuring the chatbot's solutions adhere to platform service guidelines (such as available refund within xx days of parcel arrival and coupon limits per user per week). Dynamic feedback adaptation: Static adherence to service rules and providing fixed solutions can limit the flexibility of reply-agents, preventing them from acting like excellent human customer service representatives. By enabling reply-agents to interact in real-time with their environment, considering user's behavioral trends, demands expressed during inquiries, and feedback on proposed solutions, personalized service can be provided. This approach fosters adaptive responses and progressive services and solutions, closely mirroring the flexibility and excellence of human customer service. Self-reflection: Employing LLM's capabilities to understand, analyze, and evaluate its own behavior, fostering self-supervision and decision refinement through reflection on outputs, particularly with complex and ambiguous tasks. Complex image processing: Handling scenarios involving numerous complex images (including shipping order photos, bank transaction screenshots, images of damaged goods received, and seller qualification certifications). These images contain key information crucial to enhancing the chatbot's problem resolution capabilities. 团队介绍: 智能对话团队,致力于打造业界领先的大模型对话系统。团队服务的日活用户超过数亿,应用场景覆盖抖音电商全链路,包括平台客服、平台商服、商家客服、达人客服,以及创新的智能导购等核心业务场景,通过持续的技术创新和优化,成功构建了一套完整的智能对话解决方案,为电商业务带来了显著的效率提升和用户体验改善。 课题目标: 构建基于LLM的电商客服机器人(Chatbot),服务TikTok和抖音电商智能客服场景,由LLM完成一次用户进线的完整接待过程,包括诉求澄清、方案协商、方案执行等阶段。 必要性: LLM具有强大的对话和推理能力,智能客服是LLM能够发挥价值的最典型场景,有机会能够达到匹配优秀人工客服的服务能力。 课题内容: 设计一个基于LLM 的 multi-agent framework,将 planning-agent、reply-agent、tool-agent 集成到一起,每个 agent 负责不同能力,互相协同,完成从问题定位、方案协商,到方案执行、结果反馈等服务全流程。reply agent 需要确保给用户提供的方案是符合平台的相关政策和service policy的,不自行过度发挥、不出现幻觉,顺滑的完成和用户的沟通协商过程;planning agent 完成定位用户诉求和问题场景,以便从外部获取该场景的服务准则和约束,如何识别风险场景;tool agent 需要确保工具调用的合法性、接收和解析工具调用的返回结果,另外一些动作的执行存在前后依赖的问题。 课题挑战: 1、遵循服务准则:如何确保方案Chatbot提供的方案是follow平台服务准则的,例如到货xx天之内可以申请退款、同一用户一星期内最多发送xx额度的优惠券; 2、感知环境反馈:reply agent如果只能死板的follow当前场景服务准则,提供一层不变的方案,是无法像优秀客服一样做到灵活变通的。让Agent能够实时的和环境打通,通过结合当前用户进线前的行为动线、进线后表达的诉求和用户对 agent 提供方案的反馈,为用户提供个性化的服务,对用户的实时反馈有响应,像优秀客服一样能随机应变,递进式的提供服务和解决方案; 3、进行自我反思:利用LLM理解、分析和评价其自身的行为,使LLM能够自我监督,通过对自身输出的反思,改进其所做的决策,以便在处理复杂、有歧义的任务时,能有更好的表现; 4、复杂图片理解:电商场景存在大量复杂的图片,包括运费订单实拍图、银行流水截图、买家收货缺件破损的、商家各类资质证明等,这类图片往往包含重要的信息,对提升Chatbot解决能力非常重要。