logo of tesla

特斯拉(高级/资深)可靠性工程师 (Senior/Staff) Site Reliability Engineer, Fleetnet

社招全职软件平台地点:上海状态:招聘

任职要求


Must
5+ years building and maintaining SaaS infrastructure with a healthy mix of….
Expert skills with Linux, networking, storage and virtualization automation with tools like Kubernetes, Terraform, Ansible, Chef et aliq.
Setting up and supporting CI/CD.
Proficiency in a high-level language like Python, Go, Ruby and/or Java.
Scaling through data-driven capacity planning, within both physical data centers and Cloud infrastructure (AWS, GCP or Azure) nice to have.
Troubleshooting and full-cycle incident response (mitigation, correction, prevention).
Strong belief in spreading (& acquiring) knowl…
登录查看完整任职要求
微信扫码,1秒登录

工作职责


THE ROLE
We're the small, expert team creating the next-generation server-side infrastructure to support the manufacturing and functionality of fleets of Tesla products, and we're looking for seasoned SREs with domain expertise in one or more of: containers, public clouds and cloud-native apps.
Today, Tesla owners rely on our services to safely and securely summon their cars with a tap on their mobile phones -- a feature enabled by one of the many over-the-air updates we've delivered to the Tesla vehicle fleet. Tesla engineering relies on our data and analytics platform to make Tesla products better and safer. And, when an owner needs assistance, Tesla service and support rely our applications to understand and respond to the situation. Tomorrow, we will apply fleet learning to dispatch and deliver real-time road conditions to millions of autonomous vehicles and manage distributed energy generation & storage at grid scale.
Join us and you will work alongside world-class software and data engineers on some of the newest and most challenging IoT, manufacturing and service engineering problems in the world today. The platform you help us build and automate will be used daily by millions of Tesla owners (and tens of thousands of Tesla employees) to improve and enhance the functionality of our cars, chargers, and batteries worldwide.

RESPONSIBILITIES
Design and write software that enables rapid prototyping by development teams, while ensuring the highest levels of reliability and availability.
Work directly with our factory firmware team to provide highly available factory-facing services.
Drive the migration of large-scale, distributed fleet applications towards cloud-native microservices.
Influence architectural decisions with focus on security, scalability and high-performance.
Automate the build and deployment of infrastructure using Docker, Kubernetes & other orchestration technologies in a hybrid-cloud environment.
Setup and maintain monitoring, metrics & reporting systems for fine-grained observability and actionable alerting.
包括英文材料
SaaS+
Linux+
Kubernetes+
Terraform+
Ansible+
CI+
还有更多 •••
相关职位

logo of hello
社招3年以上技术

1. 负责中后台以及端侧相关产品的质量保障工作; 2. 管理各个测试环境的稳定性以及可靠性; 3. 设计、执行测试用例,分析、定位bug出现具体原因并跟踪问题解决; 4. 编写自动化测试脚本、代码,提升测试效率; 5. 根据需求完成非功能、性能、兼容性、安全质量相关工作; 6. 熟悉互联网项目测试,掌握常用测试工具和测试方法,能够主动进行技术钻研。

更新于 2025-02-05上海
logo of hello
社招3-5年软研类

职位描述 参与智驾地图平台需求的开发工作,深入挖掘和分析业务需求,撰写技术方案和系统设计,以及相关代码的开发; 深度参与地图相关服务架构设计及实现,分析和发现系统的优化点,推动相关服务的合理性、可靠性、可用性的提升; 任职资格 JAVA基础扎实,精通多线程编程,熟悉分布式,缓存,消息队列等机制;熟悉JVM,包括内存模型、类加载机制以及性能优化; 对各种开源的框架如SpringBoot、Dubbo等有深入的了解,精通关系型数据库设计及SQL,精通unix/linux操作系统; 具备良好的识别业务关键需求和设计领域模型的能力; 积极乐观,善于沟通和团队合作,有良好创业心态、创新精神者优先 有大型分布式、高并发、高负载、高可用性系统设计和稳定性经验优先 资深开发工程师-自动驾驶地图平台 核心工作职责 负责地图相关工程服务开发:设计分布式架构,处理 TB/PB 级多源地图数据,搭建自动化更新平台,支持万级并发。 云端路径规划引擎研发:开发智驾专用路径算法,适配大规模路网,融合动态信息,实现车道级规划,支持百万级终端在线请求; 高可用与性能保障:搭建高可用架构(跨区域灾备、多活)通过监控与优化工具定位瓶颈,提升稳定性,解决极端场景问题,配合完成高并发测试与灾备演练。 技术落地与协作:主导云端技术栈(Go/Java/C++、K8s、Docker 等)落地;

更新于 2026-02-03杭州|上海
logo of momenta
社招8年以上

1、负责车载智驾项目Linux、QNX等系统的Camera,Display模块驱动开发; 2、负责对应模块性能优化、稳定性优化,完成相机、显示相关底软的量产交付攻坚;

更新于 2025-11-05上海
logo of netease
社招3年以上网易职能

1、负责支撑整个部门的产品运维工作,包括Linux操作系统及基础服务如Nginx、Kvm、DNS、DHCP、ES等日常管理和维护; 2、负责排查处理linux下的各类故障告警,进行相关系统调优, 持续完善监控告警体系; 3、参与自动化运维工具和平台开发,提升自动化程度,减少人为操作风险;通过工具和平台将运维能力赋能给外部其他团队,提高整体效率; 4、负责业务产品的服务器系统层架构设计,实施和维护高可用性、高性能系统架构,确保服务稳定性,将可靠性作为系统设计的核心目标 5、与开发、网络等其他团队紧密协作,提供所需系统层面技术支持和解决方案; 6、持续优化运维操作和流程,建立和维护完善的技术文档,构建团队知识库,促进经验共享和知识传承; 7、关注业界前沿技术动态,通过新的运维技术和方法解决线上问题,提升团队运维质量。

更新于 2025-05-08杭州