苹果Reliability Engineer
任职要求
Minimum Qualifications
• BS/MS or equivalent experience in Mechanical Engineering / Electrical Engineering / Image Science, Photographic OR Motion-Picture
• Plenty of experiences of consumer product development & debugging OR reliability testing
• Proficient in English and Chinese, and excellent communication…工作职责
• Lead operations reliability testing and failure analysis in the development and sustaining stage • Drive and manage standard methodologies and findings across data sets to improve ORT and Failure analysis efficiency • Research and strategically implement iPhone system, module reliability testing to continuously improve system quality and field product usage model during mass production stage • Drive alignment and transition of reliability testing from NPI to MP • Working with commodity managers and supplier quality engineers to ensure component quality & reliability • Making recommendations to improve design, process and test • Strategic identification and prioritization of issues identified in ORT • Lead failure analysis of issues identified in ORT and coordinate cross functions to drive corrective actions through deep FA • Succinctly communicating technical updates to the executive team
-Site Reliability Engineer,负责百度智能云网络业务的可靠、稳定、高效运行 -制定网络运维规划,深入各运维技术子方向(故障/容量/变更/成本等),提供平台化运维解决方案 -参与设计、开发高效运维平台与工具,持续提升运维效率 -关注业界相关技术动态,洞察关键技术创新机会
-Site Reliability Engineer,负责百度智能云网络业务的可靠、稳定、高效运行 -制定网络运维规划,深入各运维技术子方向(故障/容量/变更/成本等),提供平台化运维解决方案 -参与设计、开发高效运维平台与工具,持续提升运维效率 -关注业界相关技术动态,洞察关键技术创新机会
Site Reliability Engineer (SRE) 结合了软件和系统工程,致力于打造高扩展、高可用的分布式系统。 1、保障大数据&计算多个核心系统的可靠性与正常运行,同时关注系统成本与稳定性; 2、为大型系统构建自动化运营解决方案;与系统开发团队合作,从系统设计到上线的整个生命周期内保障系统可靠性; 3、通过监控系统组件可用性、性能指标提升系统可见性,帮助系统开发以及团队快速定位故障; 4、推动提升服务的可靠性、可扩展性以及成本、性能优化,保障系统 SLA; 5、参与设计、实现能够保障线上大规模集群快速迭代的自动化平台; 6、基于业务使用场景,深入优化提供最佳服务治理实践,包含不局限于关键链路性能瓶颈分析、业务问题定位排障、推进系统高可用架构改造升级等。
1. Ensure the stability, reliability, and efficient operation of the Xiaomi's global business, maintaining high availability of services at all times. 2. Responsible for core operational tasks such as resource provisioning and management, incident response, capacity management, monitoring, and reliability improvements. 3. Review technical architecture design, assess soundness of the design, and proactively identify and resolve reliability risks. 4. Conduct in-depth analysis of systemic deficiencies, identify bottlenecks and develop optimization strategies; plan and execute projects to improve system reliability and ensure cost-effectiveness and highly availability of the systems. 5. Participate in 24/7 on-call rotation, promptly respond to and resolve production incidents to ensure service availability. 6. Analyze and improve processes to build stable, highly available systems; drive continuous automation improvements, and minimize manual intervention.