苹果AIML- Infrastructure Systems Engineer (Machine Learning), Machine Learning Platform Technologies

社招全职Machine Learning and AI2025-07-30地点：上海状态：招聘

扫码手机上打开

任职要求

Minimum Qualifications
• Master or PhD degree in Computer Science, Electrical Engineering or equivalent
• 5+ years of Systems or AIML production-service experience, commensurate with running cutting-edge hybrid cloud services in China and the rest of the world
• Solid understanding of system architecture and large-scale service or computational platform operations
• Demonstrated understanding of system management, covering aspects of configuration and usage accounting
• Proficiency in coding with scripting and programming languages, including Bash, Python, Golang and Java - while having the ability to select the proper language as a tool to solve a certain problem
• Experience in large-scale service and job deployment, using an orchestration framework (Kubernetes) and cloud services for large-scale projects
• Experience in observability of system behaviors (e.g. Prometheus, Grafana)

Preferred Qualifications
• Self-motivated and proactive, with demonstrated creative and critical thinking capabilities
• Ability to identify problems in depth, distinguishing purposes vs. measures without confusion
• Strong sense of thoroughness, driving details, delive…

登录查看完整任职要求

微信扫码，1秒登录

工作职责

The Infrastructure Systems Engineer will do the following tasks, through collaboration with team members in China and around the world. 

-  Analyze the requirements, demands, constraints and challenges of machine learning in local or global environments, design or re-design platform architecture to improve its scalability and agility, and to enable new, high-impact use cases

-  Develop and implement the above design, bringing it to an internal product, with observability to support efficient system management

-  Design and/or enhance automation of operations for infrastructure and platforms, including tools and processes of monitoring, logging and alerting, to improve scalability in both system construction and runtime operations 

-  Support Dev and Eng efforts through provisioning operational solutions, co-design ML application architecture and drive the coordination among local and global, internal and cross-functional groups to achieve the result of success

-  Create performance profile for platforms and services, defining service level objectives (SLO) and driving the measurement, monitoring and evaluation over these objectives 

-  Lead constant evaluation on system performance and reliability, discover potential faults, drive RCA and fixes

📮 投递简历 ✨AI模拟面试

难度：

包括英文材料

REST+

Bash+

Python+

Go+

Java+

还有更多 •••

登录查看完整学习资料

相关职位

Principal AI Specialist Solution Architect - Infrastructure

社招Solution

- As an AIML Specialist Solutions Architect (SA) in AI Infrastructure, you will serve as the Subject Matter Expert (SME) for providing optimal solutions in model training and inference workloads that leverage Amazon Web Services accelerator computing services. As part of the Specialist Solutions Architecture team, you will work closely with other Specialist SAs to enable large-scale customer model workloads and drive the adoption of AWS EC2, EKS, ECS, SageMaker and other computing platform for GenAI practice. - You will interact with other SAs in the field, providing guidance on their customer engagements, and you will develop white papers, blogs, reference implementations, and presentations to enable customers and partners to fully leverage AI Infrastructure on Amazon Web Services. You will also create field enablement materials for the broader SA population, to help them understand how to integrate Amazon Web Services GenAI solutions into customer architectures. - You must have deep technical experience working with technologies related to Large Language Model (LLM), Stable Diffusion and many other SOTA model architectures, from model designing, fine-tuning, distributed training to inference acceleration. A strong developing machine learning background is preferred, in addition to experience building application and architecture design. You will be familiar with the ecosystem of Nvidia and related technical options, and will leverage this knowledge to help Amazon Web Services customers in their selection process. - Candidates must have great communication skills and be very technical and hands-on, with the ability to impress Amazon Web Services customers at any level, from ML engineers to executives. Previous experience with Amazon Web Services is desired but not required, provided you have experience building large scale solutions. You will get the opportunity to work directly with senior engineers at customers, partners and Amazon Web Services service teams, influencing their roadmaps and driving innovations.

更新于 2025-07-18上海|北京|深圳

Senior Software Engineer, Machine Learning & AI

社招Hardware

• Design, develop, and deploy robust AI/ML systems with high-quality, scalable, and maintainable code • Translate complex, ambiguous requirements into clear technical plans and lead project execution across engineering efforts • Build scalable infrastructure and platforms to support cutting-edge machine learning workflows, including agentic systems that can plan, reason, and act autonomously • Research and apply state-of-the-art ML techniques—including LLMs, custom model training, and RAG/agent-based architectures—to real-world hardware challenges • Stay current with the fast-evolving AI/ML landscape, continuously improving our tools, systems, and methods to maintain a technical edge • Provide technical mentorship, foster a culture of excellence and inclusion, and help grow team capabilities • Lead design discussions, author technical documentation, and provide thoughtful, actionable feedback to peers • Represent the team in executive reviews, product demos, retrospectives, and cross-functional forums

更新于 2025-10-15上海

AIML- Infrastructure Systems Engineer (Machine Learning), Machine Learning Platform Technologies

社招Machine

The Infrastructure Systems Engineer will do the following tasks, through collaboration with team members in China and around the world. - Analyze the requirements, demands, constraints and challenges of machine learning in local or global environments, design or re-design platform architecture to improve its scalability and agility, and to enable new, high-impact use cases - Develop and implement the above design, bringing it to an internal product, with observability to support efficient system management - Design and/or enhance automation of operations for infrastructure and platforms, including tools and processes of monitoring, logging and alerting, to improve scalability in both system construction and runtime operations - Support Dev and Eng efforts through provisioning operational solutions, co-design ML application architecture and drive the coordination among local and global, internal and cross-functional groups to achieve the result of success - Create performance profile for platforms and services, defining service level objectives (SLO) and driving the measurement, monitoring and evaluation over these objectives - Lead constant evaluation on system performance and reliability, discover potential faults, drive RCA and fixes

更新于 2025-10-15北京

AIML Infrastructure Systems Engineer Intern

实习Machine

• The Infrastructure Systems Engineer Intern will do the following tasks, through collaboration with team members in China and around the world. • - Analyze the requirements, demands, constraints and challenges of machine learning platform in local or global environments. Design or re-design platform architecture to improve its scalability and agility, and to enable new, high-impact use cases • - Investigate new technologies to enhance system performance, reliability and redundancy. Create performance profile for platforms and services, defining service level objectives (SLO) and driving the measurement, monitoring and evaluation over these objectives • - Improve automation of operations for infrastructure and platforms, including tools and processes of monitoring, logging and alerting, to improve scalability in both system construction and runtime operations • - Develop and implement the above design, bringing it to an internal product, with observability to support efficient systems management

更新于 2025-10-29北京