特斯拉Sr. Storage Engineer
任职要求
岗位描述: 特斯拉正在利用新一轮人工智能技术解决交通,能源等棘手的世界性难题,我们正在寻找一位充满激情的存储工程师加入这个高度创新的团队,来帮助公司实现加速世界向可持续能源的转变。作为特斯拉IT基础设施团队的一员,我们负责交付始终在线的存储基础服务,以便特斯拉能够设计、构建和支持其世界级的产品。在这个关键的角色中,您将与复杂的IT系统上下游各个团队通力合作,建设和管理高可用和可伸缩的存储平台,确保其与我们的工程、制造和应用系统100%兼容。 岗位职责: · 负责分布式存储的部署,配置和维护,包括分布式块存储、对象存储、文件存储。 · 确保我们的存储基础设施满足SDS(软件定义存储),HCI(超融合)以及混合云等各类场景的需求。 · 通过容量规划、性能控制以及配置调整确保存储和备份的扩展性和性能。 · 展示高水平的技术专长来支持复杂的存储和备份设备,包括SAN、NAS和跨所有层的备份及备份解决方案。 · 参与新需求的技术评估、技术标准制定。 · 负责存储和备份环境的配置、维护、数据复制、故障恢复、数据迁移。 · 快速响应紧急事件并高效解决问题,减少系统宕机时间。 · 编写和维护文档知识库,并能够对年轻工程师传递经验和知识。 最低要求: · 8年以上存储及备份管理工作经验,具备较强的抗压能力,性格沉稳,做事仔细。 · 具有在大型,复杂的企业数据中心环境工作经验,有丰富的PB ~ EB级别数据规模的管理经验。 · 深入了解数据存储相关的硬件及协议,包括但不限于AHCI,SCSI,NVMe,FC SAN,InfiniBand,RDMA等。 · 深入了解如何设计,部署和支持开放系统,企业级SAN,NAS以及私有云架构下的备份解决方案。 · 对数据中心整合、IaaS、私有云、虚拟化和容量规划技术和最佳实践有深刻的理解。 · 具有Pure,VAST,Dell EMC等存储产品的实际操作经验
工作职责
无
About the team The Industrial Energy team designs the eyes, ears, and brains of Tesla’s Energy Storage (Megapack) products. These system boards control the central processing, communications, thermal systems, high voltage safety, and system level components including breakers, contactors, and pyrofuses. The Role The Industrial Energy team is looking for a skilled and motivated individual to support the development, debug and continuous improvement activities of the Megapack PCBAS and factory test infrastructure. This person will serve as a first line of support to trouble-shoot PCBA failures from factory test as well as field returns. They will also perform sustaining activities such as designing in alternate components, cost-downs and design improvements. This person will interface with PCBA vendors and Tesla staff in the supply chain, factory test, field service and design engineering groups, requiring clear and organized communication. Responsibilities • Troubleshoot Megapack PCBA failures and drive corrective actions. • Start to finish design of tester PCBAs to support factory test stations. • Collaborate on the design and improvement of electronics test infrastructure hardware and software. • Support design updates to Megapack PCBAs. • Develop and execute test plans to validate circuit performance.
The Role TESLA is offering a full-time IT Support DevOps AI position in the Information Technology Department (Work Location: Tesla Giga Factory Shanghai). If you are a versatile expert integrating AI development, DevOps practices—someone who can efficiently tackle challenges, solve complex technical problems in user support and experience scenarios, and reject repetitive and inefficient work patterns—this role is perfect for you. IT Support DevOps AI is a core role connecting the company’s IT systems and user-facing processes, standing at the forefront of enhanced user support implementation. You will engage in work across multiple domains, including AI technology R&D, containerized deployment, and operational support. Through technical practice, you will support the company in optimizing user interactions, improving support efficiency, and contributing to the core goal of user experience transformation. Responsibilities • Undertake AI algorithm R&D, model optimization, and training, with a strong emphasis on fine-tuning (FT), supervised fine-tuning (SFT), reinforcement learning (RL), and advanced tuning techniques; focus on user support scenarios such as data analysis, query resolution, issue detection, and automated assistance to ensure AI technology aligns with user experience needs. • Complete the deployment, monitoring, and scaling of AI solutions based on container technologies like Kubernetes (K8s) and Docker, ensuring high availability and stability of the system in the operational environment, while integrating AI underlying technologies like neural networks and Transformer architectures for efficient performance. • Participate in DevOps process development, optimize the full lifecycle of AI model and system development, testing, and deployment, and realize automated deployment, continuous integration (CI), and continuous delivery (CD), incorporating RL-based optimization and model tuning for adaptive user support systems. • Collaborate with user support-related departments such as helpdesk, customer service, and product teams to deeply understand user pain points and provide data-driven AI technical solutions, leveraging SFT and attention mechanisms to enhance personalized user experiences. • Respond quickly to technical requirements and faults in user-facing systems, troubleshoot issues in AI systems, container clusters, and network environments, minimize impacts on user interactions, and improve support efficiency and satisfaction through advanced AI tuning and underlying model diagnostics. • Track cutting-edge technologies in the AI and DevOps fields (e.g., large language models with FT/SFT/RL integration, cloud-native operations) and industry trends, promote the pre-research and application of new technologies in user support scenarios, and continuously optimize system performance using techniques like model compression and quantization.
The Role Compute is the most important driver in accelerating the maturation of AI enabled products. Today, Tesla is at the forefront of creating meaningful real world products using AI. We design, build and run large scale GPU clusters that enable our teams to build better products faster. We are an extremely small team, and the work of every member carries an immense amount of weight. Working with the team, you will build out performance testing tools, build health check tools, create tools for better metric collection and all other fun projects. Responsibilities You’ll be working in a cross-functional and highly versatile team that designs, implements, and maintains HPC technical stacks. Leverage and improve upon existing cluster management solutions to ensure rapid deployment and scalability. Ensure the reliability of the existing systems to guarantee uptime and availability of core foundational services. Influence architectural decisions with focus on security, scalability and high-performance. Work with engineering teams to understand useful metrics to collect and implement such monitoring and alerting with existing monitoring solutions. Improve root cause analysis and corrective action for problems large and small – identify patterns and design task automations. Help develop automated tools to collect information that can be directly used to assist users creating root cause analysis for issues in their job submissions. Organize and document implemented solutions for long term information retention with our internal ticketing and documentation system. Take part in a 24 x 7 on-call rotation Must
THE ROLE: Join the AMD AECG (Adaptive and Embedded Computing Group) as the leader of our China Customer Engineering team to further strengthen and grow the team. In this role, you will lead the customer program engagements and deep customer co-engineering supporting Embedded x86 customers in the Greater China market. In this customer-facing role, you will collaborate with local FAE and sales managers, global Customer Applications Engineering teams and R&D Engineering teams, and many other cross-functional stakeholders to ensure successful, on-time and high-quality deployment of AMD Embedded x86 processors into customer designs from evaluation through development and production. You will also build strong and deep relationships with engineering leaders of the customers and be the influential voice of customer internally. Key market segments are networking, storage, automotive and edge-AI. THE PERSON: Brief description of what type of person would be successful in the role and key traits needed KEY RESPONSIBILITIES: Team Leadership: Lead a team of local Customer Application Engineers and other technical experts who may be remote to engage with China customers to adopt and develop designs with AMD Embedded x86 processors. Build and grow the Greater China Customer Engineering team through hiring and team development. Evaluations and Design-Wins: Engage yourself and team deeply with customers to understand the key care-abouts, enable hands-on evaluations and build compelling technical and architectural engagement to win China customer designs working closely with global teams Issue Resolution / Customizations: Oversee the triage, debugging, and resolution of customer issues, ensuring timely coordination with internal engineering and product teams, and drive issue closure. Build strongly technical team to create and deliver custom features in self-contained fashion. Escalation and Crises Management: Serve as the primary escalation point for complex customer engineering challenges, driving resolution and customer satisfaction. Technical Guidance: Provide training and support to customers and ODMs to adopt AMD Embedded x86 processors, development tools and design guides. Customer Communication: Drive the team to create and review technical information disclosure, training materials, and other customer-facing documentation. Resource & Onboarding Management: Direct hardware resources allocation, continue to manage, develop and grow a high-performing Customer Engineering team in China. Build strong competent team with the key expertise needed for emerging markets. Deep Partnership and Co-engineering with Customers : Built a customer obsessed team of strong technical engineers who can work in deep co-engineering working model with customers and this partnership building a competitive moat.