小红书 SRE工程师-国际化

社招全职运维开发2026-04-09地点：新加坡状态：招聘

扫码手机上打开

任职要求

1、稳定性与SRE经验 — 熟悉大规模互联网系统稳定性保障体系，具备高可用架构设计、故障治理、容量规划及应急响应经验；有 SRE / 平台工程 / 基础设施团队经验者优先。
2、国际化架构经验 — 熟悉跨 Region 架构设计与容灾体系，如多 Region 部署、流量调度、数据同步与容灾切换等；有海外业务架构或国际化基础设施建设经验者优先。
3、基础技术能力 — 熟悉 Linux 系统、网络与常见中间件原理（如 MySQL、Redis、Kafka 等），理解云原生基础设施（Kubernetes、Service Mesh 等）与可观测体系（监控、日志、Tracing）。
4、研发与自动化能力 — 熟练掌握 Python、Go、Java 等至少一种编程语言，具备自动化运维平台、稳定性工具或基础设施系统研发经验。
5、问题分析与协作能力 — 具备良好的问题分析与故障排查能力，能够在复杂系统环境中快速定位问题；具备良好的沟通能力与团队协作意识。
6、语言能力 — 中英文流利，能够在国际化团队环境中进行技术沟通与协作。

1、Reliability Engineering & SRE Experience — Familiar with large-scale internet system stability frameworks; experienced in high-availability architecture design, fault governance, capacity planning, and incident response. Experience in SRE, platform engineering, or infrastructure engineering is preferred.
2、International Architecture Experience — Familiar with cross-region architecture design and disaster recovery systems (multi-region deployment, traffic scheduling, data sync, failover, etc.). Experience with overseas business architecture or international infrastructure development preferred.
3、Core Technical Skills — Proficient in Linux sys…

登录查看完整任职要求

微信扫码，1秒登录

工作职责

1、国际化架构与容灾建设 — 参与公司国际化基础设施架构设计与落地，负责跨 Region 架构、容灾与高可用能力建设，推动关键业务具备多 Region 部署、容灾切换及故障隔离能力，提升海外业务整体稳定性水平。
2、海外基础技术平台建设与运维 — 负责公司基础技术管控平台（如发布系统、监控告警、配置中心、服务治理、流量调度等）在海外 Region 的部署、运维与持续优化，保障海外环境与国内平台体系的一致性与可用性。
3、稳定性治理与应急响应 — 负责海外业务稳定性体系建设，包括可观测能力建设、故障应急响应、根因分析与复盘机制建设；在重大故障场景下牵头协调跨团队资源，快速恢复服务并推动系统性改进。
4、国际化技术方案落地 — 深入理解海外业务需求与架构特点，推动基础设施能力在海外场景的落地，包括多 Region 架构设计、网络与数据架构优化、基础服务能力适配等。
5、跨团队协作与体系建设 — 与国内基础设施团队、业务研发团队及平台团队紧密协作，推动海外技术体系与国内架构标准保持一致；沉淀海外稳定性最佳实践并推动在组织内推广。

1、International Architecture & Disaster Recovery — Participate in the design and implementation of Rednote's international infrastructure architecture. Build and evolve cross-region architecture, disaster recovery, and high-availability capability development. Drive critical services toward multi-region deployment, failover, and fault isolation to improve overall stability of overseas operations.
2、Overseas Infrastructure Platform Development & Operations — Own the deployment, operations, and continuous optimization of core internal technical platforms (release systems, monitoring & alerting, configuration services,service management, traffic scheduling, etc.) in overseas regions. Ensure consistency and availability across overseas and domestic platform environments.
3、Reliability Engineering  & Incident Response — Build and continuously improve the reliability framework for overseas business, including observability capabilities, incident response, root cause analysis, and post-mortem mechanisms. Lead cross-functional coordination during major incidents to restore services quickly and drive （long-term)systemic improvements.
4、International Technical Solution Delivery — Develop a deep understanding of overseas business requirements and architecture characteristics. Drive infrastructure capabilities to fit overseas scenarios, including multi-region architecture design, network and data architecture optimization, and adaptation of foundational services.
5、Cross-functional Collaboration & Best Practice Development — Work closely with domestic infrastructure, product engineering teams, and platform teams to align overseas technical standards with domestic architecture standards. Consolidate and promote overseas stability best practices across the organization.

📮 投递简历 ✨AI模拟面试

难度：

包括英文材料

高可用+

系统设计+

流量调度+

Linux+

中间件+

MySQL+

Redis+

Kafka+

还有更多 •••

登录查看完整学习资料

相关职位

资深SRE工程师-国际化内容安全平台

社招A138855

1、面向TikTok内容安全方向，提供符合实际业务场景的SRE解决方案； 2、深度参与和推进容灾能力建设，提供端到端容灾解决方案，确保极端故障场景的容灾能力； 3、参与容灾标准的确立、核心指标的建设分析运营、业务架构中稳定性风险的识别、容灾演练验收、应急流程工具建设等各类稳定性建设工作。

更新于 2026-04-21上海

SRE工程师（容灾应急响应方向）-国际化内容安全平台

社招5年以上A185461A

1、持续支撑国际化内容安全平台内部视频安全、直播安全等多条业务日常稳定性保障，构建并优化可观测性大盘，积极参与容灾响应和应急，持续提升MTTR和SLA； 2、通过体系化的监控、运维、容量管理、资源成本管理、跨区域容灾建设、巡检、流程规范建设、应急响应、事故管理等方式维护线上服务稳定性，保证服务SLO；同时积极应用数据驱动、自动化运维等方式提升运维效率和稳定性运营能力； 3、面对线上问题有体系化的排查思路，快速定位问题能力，建立事故响应机制。

更新于 2024-09-24上海

SRE工程师（容灾应急响应方向）-国际化内容安全平台

社招5年以上A230181A

更新于 2024-09-24北京

SRE工程师（运营规划方向）（北京/上海/深圳）

社招1-3年A182815A

1、深入理解国际化短视频创作和社交等业务场景在生成式AI应用过程中的资源利用和管理； 2、负责设计和落地成本和资源管理解决方案，包括但不限于资源利用率监控和管理、基础设施资源和业务容量规划、需求和预算管理、保障国际化短视频业务的重大活动资源管理； 3、负责搭建完善的资源监控系统，监控和管理GPU/CPU、存储等资源的利用率和成本，提出资源和成本优化建议； 4、负责服务售卖相关商品定价、规划内容； 5、主导并推进上述资源管理解决方案落地到产品工具中，实现自动化的平台化能力。

更新于 2025-01-07上海