特斯拉高级/资深可靠性工程师 Senior/Staff Site Reliability Engineer, Fleetnet
社招全职软件平台地点:上海状态:招聘
任职要求
Must 5+ years building and maintaining SaaS infrastructure with a healthy mix of…. Expert skills with Linux, networking, storage and virtualization automation with tools like Kubernetes, Terraform, Ansible, Chef et aliq. Setting up and supporting CI/CD. Proficiency in a high-level language like Python, Go, Ruby and/or Java. Scaling through data-driven capacity planning, within both physical data centers and Cloud infrastructure (AWS, GCP or Azure) nice to have. Troubleshooting and full-cycle incident response (mitigation, correction, prevention). Strong belief in spreading (& acquiring) knowledge through mentorship and acting like an owner. Smart but humble, with a bias for action and for enabling others’ success. Proficient on both Chinese and English. This job application may involve an interview with an interviewer outside of Tesla China. If you complete your application, you agree Tesla provides your application information to overseas interviewers in Tesla, Inc. for recruitment purposes. More details and contact information please seehere. (here hyperlink: https://app.mokahr.com/social-recruitment/tesla/46129#/)
工作职责
THE ROLE We're the small, expert team creating the next-generation server-side infrastructure to support the manufacturing and functionality of fleets of Tesla products, and we're looking for seasoned SREs with domain expertise in one or more of: containers, public clouds and cloud-native apps. Today, Tesla owners rely on our services to safely and securely summon their cars with a tap on their mobile phones -- a feature enabled by one of the many over-the-air updates we've delivered to the Tesla vehicle fleet. Tesla engineering relies on our data and analytics platform to make Tesla products better and safer. And, when an owner needs assistance, Tesla service and support rely our applications to understand and respond to the situation. Tomorrow, we will apply fleet learning to dispatch and deliver real-time road conditions to millions of autonomous vehicles and manage distributed energy generation & storage at grid scale. Join us and you will work alongside world-class software and data engineers on some of the newest and most challenging IoT, manufacturing and service engineering problems in the world today. The platform you help us build and automate will be used daily by millions of Tesla owners (and tens of thousands of Tesla employees) to improve and enhance the functionality of our cars, chargers, and batteries worldwide. RESPONSIBILITIES Design and write software that enables rapid prototyping by development teams, while ensuring the highest levels of reliability and availability. Work directly with our factory firmware team to provide highly available factory-facing services. Drive the migration of large-scale, distributed fleet applications towards cloud-native microservices. Influence architectural decisions with focus on security, scalability and high-performance. Automate the build and deployment of infrastructure using Docker, Kubernetes & other orchestration technologies in a hybrid-cloud environment. Setup and maintain monitoring, metrics & reporting systems for fine-grained observability and actionable alerting.
包括英文材料
SaaS+
https://www.ibm.com/cn-zh/think/topics/saas
软件即服务 (SaaS) 是一种基于云的软件交付模式,服务提供商借此托管应用程序,并通过互联网向用户提供这些应用程序。
Linux+
https://ryanstutorials.net/linuxtutorial/
Ok, so you want to learn how to use the Bash command line interface (terminal) on Unix/Linux.
https://ubuntu.com/tutorials/command-line-for-beginners
The Linux command line is a text interface to your computer.
https://www.youtube.com/watch?v=6WatcfENsOU
In this Linux crash course, you will learn the fundamental skills and tools you need to become a proficient Linux system administrator.
https://www.youtube.com/watch?v=v392lEyM29A
Never fear the command line again, make it fear you.
https://www.youtube.com/watch?v=ZtqBQ68cfJc
Kubernetes+
https://kubernetes.io/docs/tutorials/kubernetes-basics/
This tutorial provides a walkthrough of the basics of the Kubernetes cluster orchestration system.
https://kubernetes.io/zh-cn/docs/tutorials/kubernetes-basics/
本教程介绍 Kubernetes 集群编排系统的基础知识。每个模块包含关于 Kubernetes 主要特性和概念的一些背景信息,还包括一个在线教程供你学习。
https://www.youtube.com/watch?v=s_o8dwzRlu4
Hands-On Kubernetes Tutorial | Learn Kubernetes in 1 Hour - Kubernetes Course for Beginners
https://www.youtube.com/watch?v=X48VuDVv0do
Full Kubernetes Tutorial | Kubernetes Course | Hands-on course with a lot of demos
Terraform+
https://developer.hashicorp.com/terraform/tutorials
Build, change, and destroy infrastructure with Terraform. Start here to learn the basics of Terraform with your favorite cloud provider.
https://www.youtube.com/watch?v=_45W3Z8XWL4
In this video you will learn the basics of using Terraform.
Ansible+
https://docs.ansible.com/ansible/latest/getting_started/index.html
Ansible automates the management of remote systems and controls their desired state.
CI+
https://www.ibm.com/cn-zh/think/topics/continuous-integration
持续集成 (CI) 是一种软件开发实践,开发人员在整个开发周期中会定期将新的代码和代码变更集成到中央代码存储库中。它是 DevOps 和敏捷方法的关键组成部分。
https://www.youtube.com/watch?v=42UP1fxi2SY
CD+
https://www.redhat.com/zh-cn/topics/devops/what-is-ci-cd
CI/CD 是持续集成和持续交付/部署的缩写,旨在简化并加快软件开发生命周期。
https://www.youtube.com/watch?v=R8_veQiYBjI&list=PLy7NrYWoggjzSIlwxeBbcgfAdYoxCIrM2
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
Go+
https://www.youtube.com/watch?v=8uiZC0l4Ajw
学习Golang的完整教程!从开始到结束不到一个小时,包括如何在Go中构建API的完整演示。没有多余的内容,只有你需要知道的知识。
Ruby+
https://www.ruby-lang.org/en/documentation/quickstart/
This is a small Ruby tutorial that should take no more than 20 minutes to complete.
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
AWS+
https://aws.amazon.com/
Amazon Web Services offers reliable, scalable, and inexpensive cloud computing services. Free to join, pay only for what you use.
Azure+
https://azure.microsoft.com/
Invent with purpose, realize cost savings, and make your organization more efficient with Microsoft Azure’s open and flexible cloud computing platform.
相关职位

社招3年以上技术
1. 负责中后台以及端侧相关产品的质量保障工作; 2. 管理各个测试环境的稳定性以及可靠性; 3. 设计、执行测试用例,分析、定位bug出现具体原因并跟踪问题解决; 4. 编写自动化测试脚本、代码,提升测试效率; 5. 根据需求完成非功能、性能、兼容性、安全质量相关工作; 6. 熟悉互联网项目测试,掌握常用测试工具和测试方法,能够主动进行技术钻研。
更新于 2025-02-05
社招3年以上网易职能
1、负责支撑整个部门的产品运维工作,包括Linux操作系统及基础服务如Nginx、Kvm、DNS、DHCP、ES等日常管理和维护; 2、负责排查处理linux下的各类故障告警,进行相关系统调优, 持续完善监控告警体系; 3、参与自动化运维工具和平台开发,提升自动化程度,减少人为操作风险;通过工具和平台将运维能力赋能给外部其他团队,提高整体效率; 4、负责业务产品的服务器系统层架构设计,实施和维护高可用性、高性能系统架构,确保服务稳定性,将可靠性作为系统设计的核心目标 5、与开发、网络等其他团队紧密协作,提供所需系统层面技术支持和解决方案; 6、持续优化运维操作和流程,建立和维护完善的技术文档,构建团队知识库,促进经验共享和知识传承; 7、关注业界前沿技术动态,通过新的运维技术和方法解决线上问题,提升团队运维质量。
更新于 2025-05-08
社招机票业务开发
1、负责机票产线研发效能工具平台,包括但不限于自动化测试工具、流量回放自动化、代码质量分析平台、覆盖率平台等; 2、负责工具的开发、维护和优化,确保工具的稳定性、易用性和可靠性; 3、与研发、测试和其他相关团队紧密合作,了解需求并提供技术支持和解决方案,共同推动研发效率的提升; 4、跟踪最新的行业技术趋势,研究新的工具和技术,并落地实现。
更新于 2024-11-12