滴滴AI-Foundation Model CN 算法实习生
任职要求
实习算法工程师(VLM/LLM方向)- 行为预测与规划基座模型 职责 * 基于 Vision-Language Models (VLM) 和 Large Language Models (LLM),设计与实现自动驾驶中行为预测与运动规划的基座模型(Foundation Model) * 利用多模态预训练大模型进行轨迹生成与融合,提升基座模型对其他交通参与者意图的理解与预测能力 * 针对车端/云端部署,开展模型算法层面性能优化工作,例如压缩、剪枝、蒸馏、训练和推理加速等,确保模型可用性、系统实时性与资源利用率 * 与算法、软件和系统团队紧密协作,推动模型集成及在仿真与真实车载平台的落地 要求 * 计算机科学、人工智能、机器学习或相关专业本科及以上学历 * 拥有 VLM(如 Qwen-VL系列,InternVL系列)或 LLM(如 GPT 系列、LLaMA)背景,具备大模型预训练、微调、推理或工程化经验 * 熟悉生成式模型工作原理,具有相关工程经验,如DDPM, flow matching等 * 精通至少一种深度学习框架(JAX、PyTorch、TensorFlow),熟悉 Hugging Face Transformers 等工具链 * 具备行为预测、时序模型、或模仿学习等算法背景者优先 * 熟练掌握 Python和C++ * 有模型性能优化经验(模型剪枝/量化/蒸馏、训练和推理加速)者优先 * 良好的团队协作与沟通能力,热衷技术创新
工作职责
无
1、负责Foundation model和Generative AI的基础能力建设和业务落地,包括但不限于文本生成/翻译、图生文、Deepfake、大模型高效训练/推理等等,追踪业界最前沿进展,并进行前瞻性的技术研究; 2、带领团队将AIGC相关技术在广告、电商、短视频、直播等商业产品的内容理解上落地,构建新一代基于大模型的商业化生态; 3、负责大模型算法团队的项目规划、团队建设、跨团队合作,打造行业领先的内容理解算法团队。
• Design, implement, and optimize scalable ML training pipelines for training multimodal foundation models for robotics. • Collaborate with researchers to integrate cutting-edge model architectures into scalable training pipelines. • Implement scalable data loaders and preprocessors for multimodal datasets, such as videos, text, and sensor data. • Optimize GPU and cluster utilization for efficient model training and fine-tuning on massive datasets. • Develop robust monitoring and debugging tools to ensure the reliability and performance of training workflows on large GPU clusters.
The computer vision algorithm engineer will work in a dynamic team as part of the Video Engineering org which develops on-device computer vision and machine perception technologies across Apple’s products. We balance research and product to deliver the highest quality, state-of-the-art experiences, innovating through the full stack, and partnering with cross-functional teams to influence what brings our vision to life and into customers hands. Keywords: Machine learning based ISP; Low level object detection and segmentation; Multiple sensor fusion
The computer vision algorithm engineer will work in a dynamic team as part of the Video Engineering org which develops on-device computer vision and machine perception technologies across Apple’s products. We balance research and product to deliver the highest quality, state-of-the-art experiences, innovating through the full stack, and partnering with cross-functional teams to influence what brings our vision to life and into customers hands.