小鹏汽车Research Intern (Multimodal)
任职要求
1. 计算机、电子工程、人工智能等相关领域本科及以上学历在读 2. 具有扎实的机器学习算法基础,在计算机视觉、自然语言处理、图形学等相关专业领域有研究经验 3. 熟练使用PyTorch/TensorFlow等深度学习框架,具备良好的代码实现能力 4. 具有良好的团队合作能力和沟通能力 【加分项】 1. 曾…
工作职责
1. 构建行业领先的具身智能原生多模态大模型、世界模型,具备应用于通用人形机器人乃至更多具身场景下的潜力 2. 打造技术影响力,引领国际行业发展
我们致力于构建下一代 空间智能(Spatial Intelligence)系统,让AI不仅能“看懂世界”,更能理解空间结构、推理物体关系、规划行动轨迹,并在虚拟或真实环境中持续学习与演化。 你将与团队一起: 研发具备空间理解、物体感知、轨迹预测与交互规划能力的智能体模型; 构建融合 视觉语言模型(VLM)与世界模型(World Model) 的系统,实现3D场景、深度、物理与可供性(Affordance)的联合建模; 使用 Game Engine(Unreal / Unity / Isaac Sim) 搭建高保真虚拟环境,用于数据生成与智能体评测; 基于 vLLM / Ray 构建高效多模态数据管线,实现大规模生成、自动标注与验证; 推动空间智能在机器人与具身智能领域的应用落地。
The computer vision algorithm intern will work in a dynamic team as part of the Video Engineering org which develops multi-modality based video quality assessment technologies in Apple Platform. We balance research and product to deliver the highest quality, state-of-the-art experiences, innovating through the full stack, and partnering with cross-functional teams to influence what brings our vision to life and into customers hands. Keywords: Multi-Modal LLM; Video Quality Assessment; Post-training
The computer vision algorithm intern will work in a dynamic team as part of the Video Engineering org which develops on-device computer vision and machine perception technologies across Apple’s products. We balance research and product to deliver the highest quality, state-of-the-art experiences, innovating through the full stack, and partnering with cross-functional teams to influence what brings our vision to life and into customers hands. Keywords: Agentic AI; Multi-Modal LLM; Video Foundation Model; Video Generative Editing
The computer vision algorithm intern will work in a dynamic team as part of the Video Engineering org which develops on-device computer vision and machine perception technologies across Apple’s products. We balance research and product to deliver the highest quality, state-of-the-art experiences, innovating through the full stack, and partnering with cross-functional teams to influence what brings our vision to life and into customers hands. Keywords: Object detection and segmentation; Multiple sensor fusion; Activity Recognition; Video Caption