亚马逊Software Development Engineer, Data and AI Tech Team
任职要求
基本任职资格 - 3+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience - 3+ years of non-internship professional software development experience - Experience programming with at least one software programming language - Experience with full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations 优先任职资格 - Bachelor's degree in computer science or equivalent - Knowledge of data engineering pipelines, cloud solutions, ETL management, databases, visualizations and analytical platforms - Exposure to agentic…
工作职责
- Design, build, and operate production services and data pipelines across the DAT stack (agent platform, data products, business/sales tooling). - Own features end-to-end — from spec and design through implementation, testing, deployment, and operations. - Practice spec-driven, AI-native development: maintain system specs alongside code, and use agentic development workflows to ship faster. - Collaborate with PM-Ts, data engineers, and business partners to turn requirements into reliable systems. - Raise the operational bar — observability, data-quality safeguards, and incident response for the systems you own.
We are aiming to leverage AI and other leading technology and dedicated to provide safe and reliable risk control capabilities behind payments. The core technologies include rule engines, model engines, intelligent algorithm models, etc., We are the leading platform with capabilities of high concurrent real-time risk calculations and massive big data analysis and processing. And as the core risk management tech platform for global payment business, we adopt a multi-center deployment architecture around the world. Here you may have the opportunity to learn more about and participate in the design and development of the following aspects: 1. Ultimate computing optimization at the millisecond level. 2. Behavior analysis and risk mining under massive data. 3. Global multi-center system architecture planning and high-availability solution design. 4. Participated in the design of R&D of risk control systems and big data platforms. You will also have the opportunity to explore the architectural design and implementation of cutting-edge technologies such as privacy computing and large models in risk control systems.
* Large-Scale Training Pipelines: Design and implement distributed training pipelines for LLMs using tools such as Fully Sharded Data Parallel (FSDP) and DeepSpeed, ensuring scalability and efficiency * LLM Customization & Fine-Tuning: Adapt LLMs for new languages, domains, and vision applications through continued pre-training, fine-tuning, and Reinforcement Learning with Human Feedback (RLHF) * Model Optimization on AWS Silicon: Optimize AI models for deployment on AWS Inferentia and Trainium, leveraging the AWS Neuron SDK and developing custom kernels for enhanced performance * Customer Collaboration: Interact with enterprise customers and foundational model providers to understand their business and technical challenges, co-developing tailored generative AI solutions
The Role TESLA is offering a full-time IT Support DevOps AI position in the Information Technology Department (Work Location: Tesla Giga Factory Shanghai). If you are a versatile expert integrating AI development, DevOps practices—someone who can efficiently tackle challenges, solve complex technical problems in user support and experience scenarios, and reject repetitive and inefficient work patterns—this role is perfect for you. IT Support DevOps AI is a core role connecting the company’s IT systems and user-facing processes, standing at the forefront of enhanced user support implementation. You will engage in work across multiple domains, including AI technology R&D, containerized deployment, and operational support. Through technical practice, you will support the company in optimizing user interactions, improving support efficiency, and contributing to the core goal of user experience transformation. Responsibilities • Undertake AI algorithm R&D, model optimization, and training, with a strong emphasis on fine-tuning (FT), supervised fine-tuning (SFT), reinforcement learning (RL), and advanced tuning techniques; focus on user support scenarios such as data analysis, query resolution, issue detection, and automated assistance to ensure AI technology aligns with user experience needs. • Complete the deployment, monitoring, and scaling of AI solutions based on container technologies like Kubernetes (K8s) and Docker, ensuring high availability and stability of the system in the operational environment, while integrating AI underlying technologies like neural networks and Transformer architectures for efficient performance. • Participate in DevOps process development, optimize the full lifecycle of AI model and system development, testing, and deployment, and realize automated deployment, continuous integration (CI), and continuous delivery (CD), incorporating RL-based optimization and model tuning for adaptive user support systems. • Collaborate with user support-related departments such as helpdesk, customer service, and product teams to deeply understand user pain points and provide data-driven AI technical solutions, leveraging SFT and attention mechanisms to enhance personalized user experiences. • Respond quickly to technical requirements and faults in user-facing systems, troubleshoot issues in AI systems, container clusters, and network environments, minimize impacts on user interactions, and improve support efficiency and satisfaction through advanced AI tuning and underlying model diagnostics. • Track cutting-edge technologies in the AI and DevOps fields (e.g., large language models with FT/SFT/RL integration, cloud-native operations) and industry trends, promote the pre-research and application of new technologies in user support scenarios, and continuously optimize system performance using techniques like model compression and quantization.
Translate insights from AI research, academic papers, and industry updates into working prototypes and real-world applications. Design, build, and iterate on AI-powered applications, integrating APIs, vector databases, and agent frameworks. Develop and maintain evaluation datasets, define success metrics, and run systematic tests to guide iteration and improvement. Document experiments and prototypes clearly to ensure reproducibility and support team learning.