logo of xpeng

小鹏汽车Research Intern (Multimodal)

实习兼职地点:深圳 | 北京 | 上海状态:招聘

任职要求


1. 计算机、电子工程、人工智能等相关领域本科及以上学历在读
2. 具有扎实的机器学习算法基础,在计算机视觉自然语言处理、图形学等相关专业领域有研究经验
3. 熟练使用PyTorch/TensorFlow深度学习框架,具备良好的代码实现能力
4. 具有良好的团队合作能力和沟通能力
【加分项】
1. 曾…
登录查看完整任职要求
微信扫码,1秒登录

工作职责


1. 构建行业领先的具身智能原生多模态大模型、世界模型,具备应用于通用人形机器人乃至更多具身场景下的潜力
2. 打造技术影响力,引领国际行业发展
包括英文材料
学历+
机器学习+
算法+
OpenCV+
NLP+
PyTorch+
TensorFlow+
深度学习+
还有更多 •••
相关职位

logo of xpeng
实习

我们致力于构建下一代 空间智能(Spatial Intelligence)系统,让AI不仅能“看懂世界”,更能理解空间结构、推理物体关系、规划行动轨迹,并在虚拟或真实环境中持续学习与演化。 你将与团队一起: 研发具备空间理解、物体感知、轨迹预测与交互规划能力的智能体模型; 构建融合 视觉语言模型(VLM)与世界模型(World Model) 的系统,实现3D场景、深度、物理与可供性(Affordance)的联合建模; 使用 Game Engine(Unreal / Unity / Isaac Sim) 搭建高保真虚拟环境,用于数据生成与智能体评测; 基于 vLLM / Ray 构建高效多模态数据管线,实现大规模生成、自动标注与验证; 推动空间智能在机器人与具身智能领域的应用落地。

更新于 2025-10-27深圳
logo of apple
实习Machine

The computer vision algorithm intern will work in a dynamic team as part of the Video Engineering org which develops multi-modality based video quality assessment technologies in Apple Platform. We balance research and product to deliver the highest quality, state-of-the-art experiences, innovating through the full stack, and partnering with cross-functional teams to influence what brings our vision to life and into customers hands. Keywords: Multi-Modal LLM; Video Quality Assessment; Post-training

更新于 2025-11-04北京
logo of apple
实习Machine

The computer vision algorithm intern will work in a dynamic team as part of the Video Engineering org which develops on-device computer vision and machine perception technologies across Apple’s products. We balance research and product to deliver the highest quality, state-of-the-art experiences, innovating through the full stack, and partnering with cross-functional teams to influence what brings our vision to life and into customers hands. Keywords: Agentic AI; Multi-Modal LLM; Video Foundation Model; Video Generative Editing

更新于 2025-10-21北京
logo of meitu
实习算法类

Location & Duration Sydney Central; 6-12 months Role Overview You will participate in the research and development of human aesthetic enhancement and spatiotemporally consistent editing technologies at Meitu. You will work directly with real, product-scale datasets and state-of-the-art algorithms. Depending on the internship track, your work may include (but is not limited to): ·      Fine-grained and controllable image / video aesthetic enhancement ·      2D / 3D human tracking and 3D reconstruction ·      Regression, reconstruction, and structural constraints of digital human models (e.g., SMPL) This role offers the opportunity to produce both production-ready technical outcomes and high-quality academic research results. It is a research-and-engineering-oriented internship, ideal for candidates with strong interest and capability in 3D vision fundamentals, human visual quality enhancement, video generation models, and 3D human modelling. Key Responsibilities ·      Research and implement algorithms related to depth estimation, multi-view generation, and 2D / 3D tracking with spatiotemporal reconstruction ·      Follow state-of-the-art 3D vision papers and open-source projects; reproduce experiments and adapt methods to practical applications ·      Collaborate with data teams to refine the 3D aesthetic development pipeline, improve data collection and quality evaluation, and establish foundations for high-quality scaling ·      Explore the integration of human structure priors (Skeleton / SMPL / Mesh) with multi-modal cues such as depth, normals, and optical flow in reconstruction and generative models ·      Assist in building data processing, evaluation, and visualization tools (e.g., immersive video aesthetic editing) to support rapid iteration ·      Enable high-quality projection of 3D features into 2D visual outputs, with the goal of producing A-level or above academic publications

更新于 2026-01-21