苹果Senior Software Engineer (MLOps)
任职要求
Minimum Qualifications • 6+ years of experience in the design and implement of Large-scale ML Systems or Distributed Systems • Experience with model pipeline and registry tools, detecting and preventing model drift, automating model monitoring and ensuring model accuracy • Proficiency in programming languages such as Python, Java or Golang • Effective communication skills in written and spoken English • Bachelor, or above in Software Engineering, Computer Science, Machine Learnin…
工作职责
This role requires a blend of skills in software engineering, machine learning, and operations to ensure the smooth functioning of ML systems in production environments. In this role you will: - Lead the team to design and implement automation for model training, testing, validation, and deployment - Collaborate with machine learning engineers to ensure efficient deployment and scaling of ML models - Implement monitoring and alerting systems to track model performance, system health, and data drift - Optimize compute resources for cost and performance efficiency - Manage model versions to ensure traceability and reproducibility
• Lead hands-on design and development efforts primarily using Python, building robust, scalable, and customer-focused AI/ML solutions. • Engage directly with key enterprise customers to strategize, architect and implement AI driven, Agentic AI solutions leveraging Azure AI services including Azure OpenAI, Azure ML. • Translate complex requirements into practical, well-architected technical solutions. • Develop end-to-end, rapid prototypes, involving data ingestion, validation, processing, and model deployment using Azure platform components. • Build, customize, and optimize AI models and related components for customer-specific use cases. • Integrate AI solutions with full-stack architectures, preferably leveraging experience with JavaScript frameworks (e.g., Node.js, React) and/or .NET ecosystems. • Establish and maintain robust CI/CD and ML Ops pipelines, leveraging Azure DevOps, Github for automated deployments. • Proactively explore diverse datasets to engineer novel features and signals that significantly enhance ML performance. • Participate actively in every phase of the model lifecycle from conceptualization, training, fine tuning, validation, and deployment, to continuous monitoring and improvement.
• Develop and optimize the control stack, including locomotion, manipulation, and whole-body control algorithms; • Deploy and evaluate neural network models in physics simulation and on real humanoid hardware; • Design and maintain teleoperation software for controlling humanoid robots with low latency and high precision; • Implement tools and processes for regular robot maintenance, diagnostics, and troubleshooting to ensure system reliability; • Monitor teleoperators at the lab and develop quality assurance workflows to ensure high-quality data collection; • Collaborate with researchers on model training, data processing, and MLOps lifecycle.
• Develop and optimize the control stack, including locomotion, manipulation, and whole-body control algorithms; • Deploy and evaluate neural network models in physics simulation and on real humanoid hardware; • Design and maintain teleoperation software for controlling humanoid robots with low latency and high precision; • Implement tools and processes for regular robot maintenance, diagnostics, and troubleshooting to ensure system reliability; • Monitor teleoperators at the lab and develop quality assurance workflows to ensure high-quality data collection; • Collaborate with researchers on model training, data processing, and MLOps lifecycle.
• Develop and optimize the control stack, including locomotion, manipulation, and whole-body control algorithms; • Deploy and evaluate neural network models in physics simulation and on real humanoid hardware; • Design and maintain teleoperation software for controlling humanoid robots with low latency and high precision; • Implement tools and processes for regular robot maintenance, diagnostics, and troubleshooting to ensure system reliability; • Monitor teleoperators at the lab and develop quality assurance workflows to ensure high-quality data collection; • Collaborate with researchers on model training, data processing, and MLOps lifecycle.