小鹏汽车Research Scientist-Reinforcement Learning
任职要求
学历要求:硕士及以上学历,具有机器人、控制、人工智能、计算机、自动化等相关背景。 技术能力: 扎实的强化学习理论基础,熟悉主流算法(如 PPO、TD3、SAC、Behavior Cloning 等)。 熟悉 PyTorch、Isaac Gym、Isaac Lab、MuJoCo…
工作职责
强化学习算法研发与优化 设计并实现适用于人形机器人的强化学习算法(如 PPO、SAC、TD3、RLHF 等)。 探索基于 模仿学习、分层强化学习 等方法提升训练效率和泛化能力。 仿真环境构建与训练调试 熟练使用 Isaac Gym、Isaac Lab、MuJoCo 等构建高保真仿真环境。 搭建从感知到控制的闭环 RL 训练系统,包括奖励设计、状态定义、终止条件等模块。 在仿真中对人形机器人进行行走、站立、奔跑、上下坡、障碍避让等技能的训练和调试。 算法评估与系统优化 设计通用评估指标评估策略稳定性、收敛速度、鲁棒性等。 对训练 pipeline 进行系统优化(如并行采样、分布式训练、重参数化等)。 与机器人硬件团队协作 推动仿真到真实(Sim2Real)落地,参与策略在真实人形机器人上的迁移与调试。 参与系统集成和调试,包括控制接口适配与策略部署。
We empower our people to stay resilient and relevant in a constantly changing world. We're looking for people who are always searching for creative ways to grow and learn. People who want to make a real impact, now and in the future. Does that sound like you? Then it seems like you'd make a great addition to our vibrant international team. DAI AIX – AI Acceleration and Exploration, is working on the cutting-edge research of Data Analytics and AI with Siemens global technology network, and consulting, co-creation, data driven applications for the end customers. Research Scientist is to do applied research for Industrial AI applications in the team. We are seeking a Reinforcement Learning (RL) Specialist to lead the design, implementation, and optimization of RL-driven systems for post-training of foundation models. The primary focus of this role is advancing our RL capabilities for real-world applications such as industrial control systems and LLM agents. You will develop cutting-edge algorithms, improve post-training efficiency, and deploy scalable RL solutions in industry. You'll make an impact by • 1. Reinforcement learning development for post-training: • Design and implement state-of-the-art RL algorithms (e.g., PPO, SAC, DQN) for post-training of foundation models like LLMs and time series foundation models. • Implement distributed RL training pipelines using frameworks like Ray RLlib, Deepspeed, or custom solutions. • Design and implement benchmark pipelines for model evaluation. • 2. Align foundation models like LLMs and time series foundation models with specific areas/tasks through techniques like SFT, RL. • 3. Coding & Infrastructure: • Write production-grade Python code using PyTorch, numpy, and pandas. • Manage Linux-based clusters for distributed training and deployment. • 4. All other support required by the line manager if necessary.
• Design and implement advanced LLM-based architectures and agentic systems for real-world product scenarios.• Lead model training and evaluation efforts, including data preprocessing, fine-tuning, and inference optimization.• Collaborate across teams to deliver robust, scalable models aligned with product objectives and user value.• Apply and adapt research ideas to solve practical challenges in reasoning, planning, memory, and alignment.• Monitor and improve model performance post-deployment through data-driven iteration and error analysis.• Contribute to technical discussions, model reviews, and best practices within the applied science community.
• Owns the science roadmap for grounding—including retrieval, re-ranking, attribution, and reasoning—driving initiatives from problem framing to production impact. Designs and evolves state-of-the-art retrieval and RAG orchestration across documents, tables, code, and images. • Builds citation and provenance systems (e.g., passage highlighting, quote-level alignment, confidence scoring) to reduce hallucinations and increase user trust. Leads experimentation and evaluation using A/B testing, interleaving, NDCG, MRR, precision/recall, and calibration curves to guide measurable trade-offs. • Advances tool-augmented grounding through schema-aware retrieval, function calling, knowledge graph joins, and real-time connectors to databases, cloud object stores, search indexes, and the web. Partners with platform engineering to productionize models with scalable inference, embedding services, feature stores, caching, and privacy-compliant multi-tenant systems. • Nurtures collaborative relationships with product and business leaders across Microsoft, influencing strategic decisions and driving business impact through technology. Authors white papers, contributes to internal tools and services, and may publish research to generate intellectual property. • Bridges the gap between researchers (e.g., Microsoft Research) and development teams, applying long-term research to solve immediate product needs. Leads high-stakes negotiations to ensure cutting-edge technologies are applied practically and effectively. • Identifies and solves significant business problems using novel, scalable, and data-driven solutions. Shapes the direction of Microsoft and the broader industry through pioneering product and tooling work. • Mentors applied scientists and data scientists, establishing best practices in experimentation, error analysis, and incident review. Collaborates cross-functionally with PMs, research, infrastructure, and security teams to align on milestones, SLAs, and safety protocols. • Communicates clearly through design documentation, progress updates, and presentations to executives and customers. Contributes to ethics and privacy policies, identifies bias in product development, and proposes mitigation strategies.
1. Generative AI Model Development: -Design and develop generative AI models, including language models, image generation models, and multimodal models. -Explore and implement advanced techniques in areas such as transformer architectures, attention mechanisms, and self-supervised learning. -Conduct research and stay up-to-date with the latest advancements in the field of generative AI. 2. Data Acquisition and Preprocessing: -Identify and acquire relevant data sources for training generative AI models. -Develop robust data preprocessing pipelines, ensuring data quality, cleanliness, and compliance with ethical and regulatory standards. -Implement techniques for data augmentation, denoising, and domain adaptation to enhance model performance. 3. Model Training and Optimization: -Design and implement efficient training pipelines for large-scale generative AI models. -Leverage distributed computing resources, such as GPUs and cloud platforms, for efficient model training. -Optimize model architectures, hyperparameters, and training strategies to achieve superior performance and generalization. 4. Model Evaluation and Deployment: -Develop comprehensive evaluation metrics and frameworks to assess the performance, safety, and bias of generative AI models. -Collaborate with cross-functional teams to ensure the successful deployment and integration of generative AI models into client solutions. 5. Collaboration and Knowledge Sharing: -Collaborate with data engineers, software engineers, and subject matter experts to develop innovative solutions leveraging generative AI. -Contribute to the firm's thought leadership by presenting at conferences, and participating in industry events.