logo of tesla

特斯拉Sr. AI DevOps Engineer - IT Support

社招全职IT-基础架构与运营地点:上海状态:招聘

任职要求


• Educational Background: Bachelor’s degree or above in relevant fields such as Computer Science, Artificial Intelligence, Software Engineering, or Electronic Engineering.
• Work Experience: At least 3 years of experience in AI development, DevOps, or automated deployment; experience in user support or IT operations (especially in tech or service-oriented environments) is preferred.
• Technical Competencies:
o Proficiency in the Python programming language, capable of independently writing efficient AI algorithm code, data processing scripts, and DevOps automation scripts, with expertise in implementing FT, SFT, and RL algorithms.
o Mastery of Kubernetes (K8s) and Docker technologies, with practical experience in containerized deployment, cluster management, and operations in production environments, supporting AI models tuned for real-time user interactions.
o Familiarity with machine learning frameworks (e.g., TensorFlow, PyTorch), able to independently design, train, and optimize AI models for user support scenarios, including advanced tuning techniques and underlying technologies like neural networks, Transformer architectures, attention mechanisms, and model compression.
o Familiarity with CI/CD processes and toolchains (e.g., Jenkins, GitLab CI), and ability to use tools like Terraform and Ansible to implement automated building, testing, and deployment of AI systems with integrated tuning pipelines.
o Proficiency in SQL and NoSQL databases (e.g., MySQL, MongoDB, PostgreSQL) to enable efficient storage and querying of user support data, facilitating AI model training and reinforcement learning feedback loops.
• Soft Skills:
o Strong self-motivation, able to independently plan and complete work tasks with minimal supervision.
o Excellent communication skills, capable of smoothly collaborating with both technical and non-technical teams, and clearly conveying requirements and solutions.
o Outstanding problem analysis and solving abilities, able to quickly identify and resolve faults in AI systems and DevOps workflows in user-facing environments, including debugging AI underlying technologies.
• Preferred 
o Work experience in user support or IT operations in tech or service industries, with familiarity with processes such as ticketing, escalation, and feedback loops is preferred.
o Experience in AI-driven technologies like chatbots or user analytics applications (e.g., user behavior data collection, real-time personalization using RL and FT) is preferred.
o Familiarity with agile development processes and experience in participating in or leading agile projects involving AI model tuning and SFT is preferred.
o Holders of Kubernetes certifications (e.g., CKA, CKAD) or cloud service provider certifications (e.g., AWS Certified DevOps Engineer), or AI-specific certifications (e.g., in TensorFlow or PyTorch for advanced tuning) are preferred.

工作职责


The Role

TESLA is offering a full-time IT Support DevOps AI position in the Information Technology Department (Work Location: Tesla Giga Factory Shanghai). If you are a versatile expert integrating AI development, DevOps practices—someone who can efficiently tackle challenges, solve complex technical problems in user support and experience scenarios, and reject repetitive and inefficient work patterns—this role is perfect for you.
IT Support DevOps AI is a core role connecting the company’s IT systems and user-facing processes, standing at the forefront of enhanced user support implementation. You will engage in work across multiple domains, including AI technology R&D, containerized deployment, and operational support. Through technical practice, you will support the company in optimizing user interactions, improving support efficiency, and contributing to the core goal of user experience transformation.

Responsibilities

• Undertake AI algorithm R&D, model optimization, and training, with a strong emphasis on fine-tuning (FT), supervised fine-tuning (SFT), reinforcement learning (RL), and advanced tuning techniques; focus on user support scenarios such as data analysis, query resolution, issue detection, and automated assistance to ensure AI technology aligns with user experience needs.
• Complete the deployment, monitoring, and scaling of AI solutions based on container technologies like Kubernetes (K8s) and Docker, ensuring high availability and stability of the system in the operational environment, while integrating AI underlying technologies like neural networks and Transformer architectures for efficient performance.
• Participate in DevOps process development, optimize the full lifecycle of AI model and system development, testing, and deployment, and realize automated deployment, continuous integration (CI), and continuous delivery (CD), incorporating RL-based optimization and model tuning for adaptive user support systems.
• Collaborate with user support-related departments such as helpdesk, customer service, and product teams to deeply understand user pain points and provide data-driven AI technical solutions, leveraging SFT and attention mechanisms to enhance personalized user experiences.
• Respond quickly to technical requirements and faults in user-facing systems, troubleshoot issues in AI systems, container clusters, and network environments, minimize impacts on user interactions, and improve support efficiency and satisfaction through advanced AI tuning and underlying model diagnostics.
• Track cutting-edge technologies in the AI and DevOps fields (e.g., large language models with FT/SFT/RL integration, cloud-native operations) and industry trends, promote the pre-research and application of new technologies in user support scenarios, and continuously optimize system performance using techniques like model compression and quantization.
包括英文材料
DevOps+
Python+
算法+
SFT+
Kubernetes+
Docker+
TensorFlow+
PyTorch+
Transformer+
CI+
CD+
Jenkins+
GitLab+
Terraform+
Ansible+
SQL+
NoSQL+
MySQL+
MongoDB+
PostgreSQL+
AWS+
相关职位

logo of amazon
社招Solution

Every day will bring new and exciting challenges on the job while you: - Act as a strategic advisor for customers' Generative AI initiatives and internal AI agent innovation - Drive the development and implementation of collaborative AI agents within the TAM organization - Lead technical discussions around AWS AI services including Bedrock, Claude, and Amazon Q. - Make recommendations on AI architecture, security, cost optimization, and operational excellence - Champion internal AI agent success stories to inspire customer innovation - Complete analysis and present periodic reviews of AI workload performance - Guide customers in developing responsible AI practices while ensuring security and compliance - Foster an ecosystem where AI and humans progress together through knowledge sharing - Work with AWS AI/ML service teams to advocate for customer needs - Participate in customer requested meetings (onsite or via phone) - Work directly with Amazon Web Service engineers to ensure rapid resolution of AI-related issues - Available in non-business hours to handle urgent issues ------------------------------------------------

更新于 2025-07-16
logo of microsoft
社招Sales

You will lead and support your team as a people manager by fostering empowerment and accountability, guided by the principles of model, coach and care: • You will lead teams in identifying and advancing new business opportunities, integrating impactful industry insights into customer engagements, and driving strategic projects and high-impact AI solution deployments that deliver measurable business value. • You will guide your team in developing and executing opportunity strategies through effective orchestration, ensuring alignment with customer needs. This includes coaching on how to engage customers to uncover business challenges and facilitate meaningful solution discussions. • You will coach your team on applying the orchestration model and support them in building a strong partner network to drive cross-sell and up-sell motions. • Leveraging your technical and market expertise, you will mentor your team on connecting Microsoft solutions to customer outcomes and act as a thought leader in AI transformation conversations. • You will define long-term customer satisfaction strategies, lead whitespace analysis, and participate in strategic territory planning. You’ll ensure alignment across departments through regular ROB reviews and planning sessions. • You will be accountable for achieving sales targets and maintaining operational excellence. This includes coaching your team on product and sales knowledge, ensuring completion of required training and certifications, and monitoring key performance metrics across the territory.

更新于 2025-10-13
logo of amazon
社招Solution

*Hiring location: Beijing, Shanghai, Guangzhou, Shenzhen, Hong Kong(visa sponsorship provided) Would you like to join one of the fastest-growing teams within Amazon Web Services (AWS) and help shape the future of GPU optimization and high-performance computing? Join us in helping customers across all industries to maximize the performance and efficiency of their GPU workloads on AWS while pioneering innovative optimization solutions. As a Senior Technical Account Manager (Sr. TAM) specializing in GPU Optimization in AWS Enterprise Support, you will play a crucial role in two key missions: guiding customers' GPU acceleration initiatives across AWS's comprehensive compute portfolio, and spearheading the development of optimization strategies that revolutionize customer workload performance. Key Job Responsibilities - Build and maintain long-term technical relationships with enterprise customers, focusing on GPU performance optimization and resource allocation efficiency on AWS cloud or similar cloud services. - Analyze customers’ current architecture, models, data pipelines, and deployment patterns; create a GPU bottleneck map and measurable KPIs (e.g., GPU utilization, throughput, P95/P99 latency, cost per unit). - Design and optimize GPU resource usage on EC2/EKS/SageMaker or equivalent cloud compute, container, and ML services; implement node pool tiering, Karpenter/Cluster Autoscaler tuning, auto scaling, and cost governance (Savings Plans/RI/Spot/ODCR or equivalent). - Drive GPU partitioning and multi-tenant resource sharing strategies to reduce idle resources and increase overall cluster utilization. - Guide customers in PyTorch/TensorFlow performance tuning (DataLoader optimization, mixed precision, gradient accumulation, operator fusion, torch.compile) and inference acceleration (ONNX, TensorRT, CUDA Graphs, model compression). - Build GPU observability and monitoring systems (nvidia-smi, CloudWatch or equivalent monitoring tools, profilers, distributed communication metrics) to align capacity planning with SLOs. - Ensure compatibility across GPU drivers, CUDA, container runtimes, and frameworks; standardize change management and rollback processes. - Collaborate with cloud provider internal teams and external partners (NVIDIA, ISVs) to resolve cross-domain complex issues and deliver repeatable optimization solutions. ------------------------------------------------------

更新于 2025-08-18
logo of microsoft
社招Customer

• Engage with customer IT and business leaders to understand their application, data, and AI priorities, and design secure, scalable solutions that drive business value and customer satisfaction. • Lead technical engagements across architecture design, Proof of Concepts (POCs), and Minimum Viable Products (MVPs) to accelerate adoption of Azure AI, App Services, GitHub, and data platforms. • Own the end-to-end technical delivery results, ensuring completeness and accuracy of consumption and customer success plans in collaboration with the CSAM. • Drive next best actions and generate incremental pipeline from each engagement, aligning with Unified Enterprise Support (ES) priorities. • Deliver repeatable intellectual property (IP) and contribute to centralized IP development to accelerate deployment and achieve targeted outcomes. • Provide delivery oversight and escalation support for key Factory engagements across AI and App Innovation projects. • Lead the health, resiliency, security, and optimization of mission-critical workloads, ensuring readiness for production-scale AI use cases. • Act as the Voice of the Customer by sharing insights and feedback with engineering teams to influence product improvements and remove adoption blockers. • Support customer skilling through technical workshops, readiness activities, and recommendations that ensure solution performance, maintainability, and reliability. • Maintain deep technical expertise and stay current with Azure, AI, GitHub, and cloud-native development trends, while contributing to internal and external technical communities. • Be accredited and certified to deliver with advanced and expert-level proficiency in priority workloads including Azure AI Foundry, AKS, App Service, Cosmos DB, Azure SQL, PostgreSQL, APIM, and GitHub. • Demonstrate a growth mindset by continuously aligning your skills to customer needs, contributing to knowledge sharing, and mentoring others to accelerate customer outcomes.

更新于 2025-10-10