特斯拉Sr. Site Reliability Engineer
任职要求
• 5+ years of experience managing web-scale infrastructure in a production *nix environment. • Extensive hands-on experience administering and tuning Kubernetes clusters in production, including the use of ArgoCD, Helm, and other cloud-native orchestration tools. • Deep understanding of container-based workloads and cloud-native architectures. • Advanced skills in Linux administration, system internals, networking stack, filesystems, resource scheduling, and process management. • Advanced experience with configuration management and Infrastructure-as-Code tools such as Ansible, Terraform, or Puppet. • Proven experience with AWS or other major cloud infrastructure providers. • Proficiency in high-level programming languages (Python, Go, Ruby, and/or Java). • Strong track record of p…
工作职责
THE ROLE Tesla's Platform Engineering is looking for a Site Reliability Engineer to join our team. As a member of the team, you will be building and maintaining Kubernetes clusters using infrastructure-as-code tools like Ansible, Terraform, ArgoCD and Helm and helping the application teams to be successful on our platform. The underlying infrastructure is a a mix of on-premises VMs, bare metal hosts and public clouds such as AWS located all around the globe, which presents unique challenges and opportunity to work with different types of infrastructure technologies. A successful candidate will be expected to possess expert knowledge in Linux fundamentals, architecture and performance tuning; as well as software development skills to match. Experience running Kubernetes in production will be a strong plus. We prefer Golang or Python for any automation or tools we have to build along the way. We are the team that runs production critical workloads for every aspect of the business at Tesla and sets the standards for other teams, a group of well-rounded generalists that not only solve the hardest problems in the industry but also push other engineering teams at large to be better. RESPONSIBILITIES • Manage our Kubernetes clusters on-prem and in the cloud to support our growing workloads. • Participating in the architecture design process and troubleshooting of live applications with the product teams. • Participating in a 24x7 on-call rotation. • Influence architectural decisions with focus on security, scalability and high-performance. • Setup and maintain monitoring, metrics & reporting systems for fine-grained observability and actionable alerting. • Authoring technical documentation for workflows/processes/best practices.
• Develop and implement Automation Maintenance vision, policies, processes, procedures, maintenance standards and maintenance strategy • Responsible for setting up the automation maintenance team. • Manage an onsite team of maintenance engineers & technicians to provide 24/7 support. • Develop a Planned Preventative Maintenance (PM) schedule, ensure the scheduled preventative maintenance is carried out. • Develop Corrective Maintenance (CM) procedure and process, and ensure it is followed, and adapted as required. • Manage the team to analyze issues, find root causes, propose solutions, and implement them. • Set department objectives/KPIs and assess ongoing performance of direct reports. • Manage Automation Maintenance department headcount. • Establish a planned maintenance system to including reports to track maintenance activity, identify recurring faults, spare part inventory and spending. • Identify and implement an ongoing programmed of improvements to increase availability, increase productivity and reflect changing business needs. • Gather Automation Maintenance data, analyze data, generate reports and make presentations to update the management team and stake holders, on Automation Maintenance issues, status, potential implications, risks, recommend improvements. • Work collaboratively, negotiate and engage with key stakeholders to align with the maintenance strategy. • Work with HSSE and operations, to ensure the maintenance work is carried out safely, for the Maintenance Team, and for other Staff. • Develop a network of contractors and suppliers able to support our business 24/7. • Prepare the annual Maintenance budget. • Oversee management of inventory system and develop critical spare parts list. • Ensure the efficient and effective use of maintenance budget, to ensure adequate spare parts inventory, taking into account, local regional availability of stocks, delivery times, visits on site and Planed PM and CM requirements. • Manage the process of disposal of obsolete parts, plant and equipment. • Support any after-hours operational needs and activities as required • Supports new equipment installation, test & commissioning of equipment and taking over of new equipment from supplier according to contractual requirement with quality and within schedule. • Other assigned tasks as requested. • Execute facility program and planning in all Lazada Logistics facilities, such as Preventive Program Maintenance, Corrective Maintenance, facility service checklist implementation (maintenance, cleaning, pest control, and disinfectant)
STM (Site Technical Manager) is the single threaded owner for manufacturing technical issues for assigned projects with our Contract Manufacturers, focus on: - Ensure technical readiness for product ramp and serve as manufacturing engineering owner for product PVT and mass production. - Driving manufacture test flow optimization, process and yield improvements to exceed our volume production goals. - Define the manufacture process qualification criteria, manage the qualification activities, and complete the documentation. - Leading our efforts for root cause and corrective action for all manufacturing process related issues, dive deep to analyze the problems found both in and out of the factory. - Participation in design & planning through DFM, Line balancing , as well as Process Yield, Capacity and Cost Modeling. - Work with CM’s engineering teams to identify and escalate manufacturing challenges by enforcing DFM and DFT principles. - Review and approval of Fixtures Designs & Qualification (FATP, Device build) - Review and execute Manufacturing Test Coverage Documents to ensure new products launch. - Work with sustaining engineering team on the opportunities to improve our product, owns the technical readiness of PRQ in CMs. - Periodically audit the CM for manufacture process quality management system.
THE ROLE In this high-visibility leadership role, you will serve as the overall program owner driving the definition, execution, and productization of AMD’s Desktop, Notebook, and Workstation platforms with customers. You will lead cross-functional engineering teams to accelerate platform readiness, strengthen ecosystem capability, and deliver competitive, scalable AMD-based solutions to market. You will establish an execution framework that enables China-based customers to design, validate, and ship AMD platforms faster, with higher quality, and at lower development cost. This role is central to AMD’s growth strategy.