特斯拉Sr. IT Incident Response Engineer
任职要求
Must
• Minimum 5 years of relevant experience; bachelor's degree or above in Information Technology, Software Engineering, Computer Science, or equivalent.
• Fluent English; strong communication, sense of responsibility, problem-solving, and teamwork.
• Solid IT infrastructure knowledge (networking, servers, virtualization, storage, Kubernetes/application services); hands-on experience preferred.
• Major incident / on-call experience in an enterprise environment.
• Strong operations observability: Splunk, Prometheus/Alertmanager, synthetic monitoring, Grafana (or equivalents); familiarity with ITSM ticketing practices.
• Ability to le…工作职责
THE ROLE This role is a senior support position within Tesla IT Infrastructure Engineering & Operations. The Incident Response team provides incident response and management support to global cross-functional engineering teams, helping maintain high availability for Tesla Manufacturing, Business Operations, Customer Service & Experience. We reduce incident occurrence through effective IT operations monitoring, risk analysis, and change management. The Tesla APAC Incident Response Center (IRC) is a growing team of professionals from diverse backgrounds, with strong development opportunities. This role is based at Giga Factory Shanghai, China, and provides global support as Tesla's business and mission scale. Senior engineer team positioning: Acts as the regional incident management lead—coordinates teams through investigation and resolution, owns incident management practices (ticket management, root cause investigation, data analysis, and management reporting), and continuously improves processes and tooling. RESPONSIBILITIES • Independently lead end-to-end, 24×7 closed-loop incident management to minimize impact and optimize response time; organize emergency response plans, post-incident reviews, and drills as needed. • Lead or drive IT service management initiatives; establish or optimize SOPs to reduce cross-team communication barriers, promote technical and skills sharing, and raise the team's incident response capability. • Oversee IT Infrastructure & Operations monitoring and operational processes; maintain day-to-day stability and provide periodic reporting on data centers, servers, networks, applications, and related systems; identify and mitigate risks early. • Proactively support team operational improvement without day-to-day supervision—including tool iteration, process optimization, and adoption of industry best practices—to accelerate operational efficiency. • Participate in Infrastructure & Operations daily operations and change management; control change risk, improve change workflows, and support execution of change events. • Use company-approved AI tools for continuous learning and innovation to empower the organization.
THE ROLE This role will be a support engineer within the Tesla IT Infrastructure Engineering & Operations department. The Sr. Incident Response Engineer will be coordinating with cross-functional engineering teams for Incident Response & Management in terms of the high availability to Tesla Manufacturing, Business Operations, Customer Service & Experience. We help to reduce the occurrence of incidents by using efficient IT Operation monitoring, effective risk analysis and professional team collaboration. The Tesla APAC Incident Response Center is a growing team consist of professionals from diverse backgrounds, which will offer you a fantastic development environment. This role will be based on Giga Factory Shanghai, China but will provide support to Tesla Business globally considering of the growing business & great mission. RESPONSIBILITIES • Independently lead incident response and management to minimize impact and ensure optimal response times. Develop incident response plans, conduct post-mortem analyses, and organize drills to enhance preparedness. • Drive IT service management projects. Establish/optimize SOPs to reduce inter-team communication barriers, promote technical knowledge sharing, and improve team incident response capabilities. • Monitor IT infrastructure and data center operations, including servers, networks, and applications. Analyze real-time stability metrics, mitigate risks, and deliver regular operational analysis reports. • Proactively enhance team efficiency through tool automation, process refinement, and adoption of industry best practices. Support daily operations and foster a culture of continuous improvement. • Oversee infrastructure changes to minimize risks, streamline approval workflows, and ensure compliance with change management protocols.
The Role Tesla is looking for a technical and industry-experienced engineer to join a team of talented engineers. As part of Tesla IT Operation team, we are responsible to deliver 7x24 system infrastructure and provides a portfolio of services including configuration management, engineering tools, identity access and control, managing public, private cloud infrastructure, ensure security and extreme reliability is our fundamental design principal, the candidate must be hands-on on day-to-day basis with experience in building, operating and driving reliability and security for production systems at scale. Responsibilities • Responsible for the design, deployment, and support of manufacturing systems and network infrastructure. • Provide support for China-based infrastructure build-out, including datacenter, Linux system (both virtualized and bare-metal servers). • Installation, configuration, and maintenance of Linux server environment. • Ensure the reliability of the existing systems to guarantee uptime and availability of core infrastructure services. • Perform root-cause analysis of complex issues ranging through hardware, operating system, application, network, and information security platforms. • work with different business units to identify, plan, test and deploy or upgrade Linux system according to business requirements. • Partner with teams from across the organization to help tackle hard problems in a collaborative, high velocity environment. • Tackle issues across the entire stack: hardware, software, network and application. • Managing engineering tools and platform such as GitHub, Artifactory, etc. • Perform analysis, troubleshooting, and introspection on core infrastructure components and handle incident response. • Creating and maintain well documented knowledge base and be a mentor of junior engineers. • Take on call role and respond quickly to emergency bridge and provide quick and effective solutions to minimize system downtime.
The Role As an IT Engineering & Delivery Engineer (TPM) at Tesla Giga Shanghai, you will own the end-to-end lifecycle management of IT software systems—spanning development, testing, and delivery—while also steering the planning and execution of IT infrastructure construction projects. You will partner closely with global and local technical teams to ensure seamless synergy between software products, manufacturing digital systems, and IT infrastructure, driving IT operations excellence and fueling growth across all IT-enabled business functions, all while upholding Tesla’s rigorous standards for quality and efficiency. Responsibilities 1. Software System Development & Testing Leadership • Own the development roadmap, test execution, and deployment of business-critical systems, including business support platforms, IoT applications, and edge computing solutions. • Lead requirement gathering and solution design, bridging business stakeholders and development teams to translate on-site operational needs into actionable technical blueprints, alongside robust development plans and test strategies. • Oversee the full software testing lifecycle—from unit, integration, and system testing to User Acceptance Testing (UAT). Establish standardized test case repositories and defect resolution workflows to ensure software functionality, performance, and security align with Tesla’s global benchmarks. • Drive continuous software iteration and optimization, leveraging business feedback to enhance system adaptability and stability, and accelerate the factory’s digital transformation journey. 2. IT Infrastructure Construction & Cross-Functional Delivery • Lead IT infrastructure planning and deployment across factory construction, expansion, and new production line rollouts, covering network architecture, data center buildouts, server provisioning, end-user devices, physical security systems, and IoT device integration. • Coordinate cross-functional stakeholders (Engineering, Facilities, EHS, Manufacturing, and Security teams) to align IT infrastructure delivery timelines with master construction schedules, ensuring the hardware environment fully supports software deployment and operational demands. • Manage outsourced construction vendors and on-site teams, enforcing strict controls over quality, safety, and cost. Champion on-site standardization—including wiring protocols, rack layout, asset labeling, and document management best practices. • Oversee end-to-end integration and debugging between software systems and infrastructure, resolving compatibility issues to guarantee stable, post-deployment system performance. 3. Digital Project Execution & Technology Innovation • Lead the rollout of AI, IoT, and automation-driven digital projects, such as intelligent terminal application development, robot fleet management systems, computer vision algorithm deployment, and predictive maintenance platform implementation, to embed cutting-edge technology into manufacturing operations. • Collaborate with Tesla’s global IT teams to align system architecture, data security protocols, and technical standards, ensuring software and infrastructure solutions adhere to the company’s global technical framework. • Serve as the critical link between technology and business, driving manufacturing process optimization through software feature enhancements and strategic hardware resource allocation, boosting production efficiency and digital management maturity. 4. Project Governance & Process Optimization • Establish a unified project management framework covering the full lifecycle of both software (development-testing-delivery-operations) and infrastructure (planning-construction-acceptance) initiatives. Define clear milestones, budget guardrails, risk mitigation protocols, change management procedures, and documentation standards. • Proactively identify lifecycle risks—such as software development delays, hardware compatibility gaps, or cross-team collaboration bottlenecks—and design targeted mitigation plans to ensure on-time, on-budget, and quality-compliant project delivery. • Continuously refine project execution workflows, scaling Agile development, modular testing, and other lean methodologies to enhance cross-team collaboration efficiency and communication transparency. • Own the long-term lifecycle management of IT projects, supporting post-launch system maintenance, version upgrades, and infrastructure modernization to ensure sustained, reliable IT support for business operations.