小鹏汽车Research Intern (Multimodal)
任职要求
1. 计算机、电子工程、人工智能等相关领域本科及以上学历在读 2. 具有扎实的机器学习算法基础,在计算机视觉、自然语言处理、图形学等相关专业领域有研究经验 3. 熟练使用PyTorch/TensorFlow等深度学习框架,具备良好的代码实现能力 4. 具有良好的团队合作能力和沟通能力 【加分项】 1. 曾…
工作职责
1. 构建行业领先的具身智能原生多模态大模型、世界模型,具备应用于通用人形机器人乃至更多具身场景下的潜力 2. 打造技术影响力,引领国际行业发展
我们致力于构建下一代 空间智能(Spatial Intelligence)系统,让AI不仅能“看懂世界”,更能理解空间结构、推理物体关系、规划行动轨迹,并在虚拟或真实环境中持续学习与演化。 你将与团队一起: 研发具备空间理解、物体感知、轨迹预测与交互规划能力的智能体模型; 构建融合 视觉语言模型(VLM)与世界模型(World Model) 的系统,实现3D场景、深度、物理与可供性(Affordance)的联合建模; 使用 Game Engine(Unreal / Unity / Isaac Sim) 搭建高保真虚拟环境,用于数据生成与智能体评测; 基于 vLLM / Ray 构建高效多模态数据管线,实现大规模生成、自动标注与验证; 推动空间智能在机器人与具身智能领域的应用落地。
The computer vision algorithm intern will work in a dynamic team as part of the Video Engineering org which develops multi-modality based video quality assessment technologies in Apple Platform. We balance research and product to deliver the highest quality, state-of-the-art experiences, innovating through the full stack, and partnering with cross-functional teams to influence what brings our vision to life and into customers hands. Keywords: Multi-Modal LLM; Video Quality Assessment; Post-training
The computer vision algorithm intern will work in a dynamic team as part of the Video Engineering org which develops on-device computer vision and machine perception technologies across Apple’s products. We balance research and product to deliver the highest quality, state-of-the-art experiences, innovating through the full stack, and partnering with cross-functional teams to influence what brings our vision to life and into customers hands. Keywords: Agentic AI; Multi-Modal LLM; Video Foundation Model; Video Generative Editing

Location & Duration Sydney Central; 6-12 months Role Overview You will participate in the research and development of human aesthetic enhancement and spatiotemporally consistent editing technologies at Meitu. You will work directly with real, product-scale datasets and state-of-the-art algorithms. Depending on the internship track, your work may include (but is not limited to): · Fine-grained and controllable image / video aesthetic enhancement · 2D / 3D human tracking and 3D reconstruction · Regression, reconstruction, and structural constraints of digital human models (e.g., SMPL) This role offers the opportunity to produce both production-ready technical outcomes and high-quality academic research results. It is a research-and-engineering-oriented internship, ideal for candidates with strong interest and capability in 3D vision fundamentals, human visual quality enhancement, video generation models, and 3D human modelling. Key Responsibilities · Research and implement algorithms related to depth estimation, multi-view generation, and 2D / 3D tracking with spatiotemporal reconstruction · Follow state-of-the-art 3D vision papers and open-source projects; reproduce experiments and adapt methods to practical applications · Collaborate with data teams to refine the 3D aesthetic development pipeline, improve data collection and quality evaluation, and establish foundations for high-quality scaling · Explore the integration of human structure priors (Skeleton / SMPL / Mesh) with multi-modal cues such as depth, normals, and optical flow in reconstruction and generative models · Assist in building data processing, evaluation, and visualization tools (e.g., immersive video aesthetic editing) to support rapid iteration · Enable high-quality projection of 3D features into 2D visual outputs, with the goal of producing A-level or above academic publications