
美图SRE-厦门
社招全职3年以上美图宜肤事业部地点:厦门状态:招聘
任职要求
1.英语可作为工作语言 2.大学本科以上学历,计算机或相关专业毕业,3年以上互联网运维经验; 3.深入理解Linux系统,熟悉TCP/IP、HTTP、HTTPS等协议,具备扎实的网络以及操作系统知识; 4.熟悉 MySQL/Redis/MongoDB 的运维管理及性能优化,具备 DBA 能力; 5.熟悉运维体系并具有Ansible、Terraform等自动化运维/编排工具使用经验; 6.熟悉OpenTelemetry、Prometheus、Thanos、Grafana、ELK、Sentry、SkyWalking等开源可观测/监控工具; 7.熟悉Docker、Kubernetes及容器周边生态; 8.了解PHP、Golang等常见技术栈的性能分析和优化; 有以下经验者优先: 1.有出海业务/海外业务维护经验者优先; 2.熟悉运维流程或DevOps、有大型互联网工具系统开发经历者优先; 3.有生产环境大规模Kubernetes集群应用及维护经验者优先; 4.有AWS/Azure/GCP/阿里云/华为云等公有云运维或开发经验者优先; 5.了解云原生、服务治理、微服务架构、AIOps者优先。
工作职责
1.为线上服务的稳定性负责,保障业务7*24小时稳定运行; 2.参与监控系统、容器集群、负载均衡、日志集群、大数据集群等基础服务的管理维护; 3.为业务的全生命周期提供运维支撑,包括但不限于业务架构评审、CI/CD、业务变更、监控覆盖、容量管理、性能优化 4.负责 MySQL、Redis、MongoDB 等主流数据库的日常运维管理;
包括英文材料
学历+
Linux+
https://ryanstutorials.net/linuxtutorial/
Ok, so you want to learn how to use the Bash command line interface (terminal) on Unix/Linux.
https://ubuntu.com/tutorials/command-line-for-beginners
The Linux command line is a text interface to your computer.
https://www.youtube.com/watch?v=6WatcfENsOU
In this Linux crash course, you will learn the fundamental skills and tools you need to become a proficient Linux system administrator.
https://www.youtube.com/watch?v=v392lEyM29A
Never fear the command line again, make it fear you.
https://www.youtube.com/watch?v=ZtqBQ68cfJc
TCP/IP+
[英文] What is TCP/IP?
https://www.techtarget.com/searchnetworking/definition/TCP-IP
TCP/IP stands for Transmission Control Protocol/Internet Protocol and is a suite of communication protocols used to interconnect network devices on the internet.
HTTP+
https://developer.mozilla.org/zh-CN/docs/Web/HTTP
超文本传输协议(HTTP)是一个用于传输超媒体文档(例如 HTML)的应用层协议。它是为 Web 浏览器与 Web 服务器之间的通信而设计的,但也可以用于其他目的。
MySQL+
https://juejin.cn/post/7190306988939542585
这是一篇 MySQL 通关一篇过硬核经验学习路线,包括数据库相关知识,SQL语句的使用,数据库约束,设计等。
[英文] MySQL Tutorial
https://www.mysqltutorial.org/
your go-to resource for mastering MySQL in a fast, easy, and enjoyable way.
https://www.youtube.com/watch?v=5OdVJbNCSso
MySQL SQL tutorial for beginners
https://www.youtube.com/watch?v=7S_tz1z_5bA
This beginner-friendly course teaches you SQL from scratch.
Redis+
[英文] Developer Hub
https://redis.io/dev/
Get all the tutorials, learning paths, and more you need to start building—fast.
https://www.runoob.com/redis/redis-tutorial.html
REmote DIctionary Server(Redis) 是一个由 Salvatore Sanfilippo 写的 key-value 存储系统,是跨平台的非关系型数据库。
https://www.youtube.com/watch?v=jgpVdJB2sKQ
In this video I will be covering Redis in depth from how to install it, what commands you can use, all the way to how to use it in a real world project.
MongoDB+
https://learnxinyminutes.com/mongodb/
MongoDB is a NoSQL document database for high volume data storage.
https://studio3t.com/academy/#courses
The fastest way to learn MongoDB
https://www.youtube.com/watch?v=c2M-rlkkT5o
This video will give you and introduction to MongoDB in 1 Hour. Afterwards I recommend exploring aggregation, replication, and sharding.
https://www.youtube.com/watch?v=ExcRbA7fy_A&list=PL4cUxeGkcC9h77dJ-QJlwGlZlTd4ecZOA
You'll learn how to use MongoDB (a NoSQL database) from scratch. You'll also learn how to integrate it into a simple Node.js API.
Ansible+
https://docs.ansible.com/ansible/latest/getting_started/index.html
Ansible automates the management of remote systems and controls their desired state.
Terraform+
https://developer.hashicorp.com/terraform/tutorials
Build, change, and destroy infrastructure with Terraform. Start here to learn the basics of Terraform with your favorite cloud provider.
https://www.youtube.com/watch?v=_45W3Z8XWL4
In this video you will learn the basics of using Terraform.
OpenTelemetry+
https://logz.io/learn/opentelemetry-guide/#overview
Every journey in Observability begins with instrumenting an application to emit telemetry data – primarily logs, metrics and traces – from each service as it executes.
[英文] Getting Started
https://opentelemetry.io/docs/languages/go/getting-started/
This page will show you how to get started with OpenTelemetry in Go.
https://www.youtube.com/watch?v=hLvwoow3XTk
OpenTelemetry can help, with its powerful capabilities for monitoring and analyzing hybrid applications, including collecting and analyzing telemetry data, metrics, and traces.
https://www.youtube.com/watch?v=Txe4ji4EDUA
In the observability space, the project making this possible is OpenTelemetry.
Prometheus+
https://grafana.com/docs/grafana/latest/getting-started/get-started-grafana-prometheus/
Prometheus is an open source monitoring system for which Grafana provides out-of-the-box support.
https://prometheus.io/docs/tutorials/getting_started/
Prometheus is a system monitoring and alerting system.
Grafana+
ELK+
https://logz.io/learn/complete-guide-elk-stack/
With millions of downloads for its various components since first being introduced, the ELK Stack is the world’s most popular log management platform.
https://www.baeldung.com/ops/elk
In this tutorial, we’ll learn about the basics of the ELK stack.
https://www.youtube.com/watch?v=jk4RoEYCZTo
explains how to install and configure ELK (Elastic Search, Logstash, Kibana) Stack, a log management solution for analyzing and visualizing your data.
Sentry+
https://docs.sentry.io/product/sentry-basics/
Sentry is a developer-first error tracking and performance monitoring platform.
https://www.youtube.com/watch?v=cl8tPBI4qUc
Learn the basics of frontend Javascript error monitoring with Sentry.
https://www.youtube.com/watch?v=DzhVEK65eYg
Learn the basics of backend error monitoring with Sentry and recent updates to the issue experience.
Docker+
https://www.youtube.com/watch?v=GFgJkfScVNU
Master Docker in one course; learn about images and containers on Docker Hub, running multiple containers with Docker Compose, automating workflows with Docker Compose Watch, and much more. 🐳
https://www.youtube.com/watch?v=kTp5xUtcalw
Learn how to use Docker and Kubernetes in this complete hand-on course for beginners.
Kubernetes+
https://kubernetes.io/docs/tutorials/kubernetes-basics/
This tutorial provides a walkthrough of the basics of the Kubernetes cluster orchestration system.
https://kubernetes.io/zh-cn/docs/tutorials/kubernetes-basics/
本教程介绍 Kubernetes 集群编排系统的基础知识。每个模块包含关于 Kubernetes 主要特性和概念的一些背景信息,还包括一个在线教程供你学习。
https://www.youtube.com/watch?v=s_o8dwzRlu4
Hands-On Kubernetes Tutorial | Learn Kubernetes in 1 Hour - Kubernetes Course for Beginners
https://www.youtube.com/watch?v=X48VuDVv0do
Full Kubernetes Tutorial | Kubernetes Course | Hands-on course with a lot of demos
PHP+
https://www.learn-php.org/
PHP is a server scripting language, and a powerful tool for making dynamic and interactive Web pages.
https://www.youtube.com/watch?v=l4_Vn-sTBL8
This PHP full course for beginners will teach you everything from scratch—from PHP basics to advanced concepts!
Go+
https://www.youtube.com/watch?v=8uiZC0l4Ajw
学习Golang的完整教程!从开始到结束不到一个小时,包括如何在Go中构建API的完整演示。没有多余的内容,只有你需要知道的知识。
DevOps+
https://roadmap.sh/devops
Step by step guide for DevOps, SRE or any other Operations Role in 2025
https://zhuanlan.zhihu.com/p/562036793
DevOps中的Dev指的是Development(开发),Ops指的是Operations(运维),用一句话来说,DevOps就是打通开发运维的壁垒,实现开发运维一体化。
AWS+
https://aws.amazon.com/
Amazon Web Services offers reliable, scalable, and inexpensive cloud computing services. Free to join, pay only for what you use.
Azure+
https://azure.microsoft.com/
Invent with purpose, realize cost savings, and make your organization more efficient with Microsoft Azure’s open and flexible cloud computing platform.
服务治理+
https://cloudnativecn.com/blog/istio-traffic-management-series-service-management-concept-theory/
通过阅读本文读者可以初步理解 Istio 流量治理的概念和相关知识框架。
https://juejin.cn/post/6844904006033080334
服务治理主要包括服务发现、负载均衡、限流、熔断、超时、重试、服务追踪等。我们今天要讲的,就是服务发现的内容。
微服务+
https://learn.microsoft.com/en-us/training/modules/dotnet-microservices/
Microservice applications are composed of small, independently versioned, and scalable customer-focused services that communicate with each other by using standard protocols and well-defined interfaces.
https://microservices.io/
Microservices - also known as the microservice architecture - is an architectural style that structures an application as a collection of two or more services.
https://spring.io/microservices
Building small, self-contained, ready to run applications can bring great flexibility and added resilience to your code.
https://www.ibm.com/think/topics/microservices
Microservices, or microservices architecture, is a cloud-native architectural approach in which a single application is composed of many loosely coupled and independently deployable smaller components or services.
https://www.youtube.com/watch?v=CqCDOosvZIk
https://www.youtube.com/watch?v=hmkF77F9TLw
Learn about software system design and microservices.
相关职位
社招5年以上运维工程师岗
1.负责保障公司系统、应用和服务的高可用性、可靠性和性能,设计、实施和维护监控系统,及时发现并解决潜在问题; 2.快速响应和解决生产环境中的故障,确保系统正常运行; 3.开发和维护自动化工具,提高系统部署、配置和监控的效率; 4.分析系统资源使用情况,进行容量规划,确保系统能够满足业务增长需求。
更新于 2025-09-07
社招3年以上ACG
-负责百度云CDN&边缘计算平台的资源交付、容量管理及大规模分布式集群的架构设计工作,构建行业领先的资源平台 -负责百度云CDN&边缘计算自动化运维平台发布实践、实现CI/CD全流程管控,打造智能化运维平台提升效率及产品服务稳定性 -负责百度云CDN&边缘计算业务运营、指标体系建设 -负责百度云CDN&边缘计算大客户解决方案落地及质量调优工作
更新于 2025-03-31
社招3年以上ACG
-负责金融联合建模产品各类在线服务和自动化工具开发,保障服务可靠、稳定、高效运行,保障服务稳定性和数据质量,保障产品SLA -基于百度已有基础设施设计金融联合建模相关在线服务稳定性解决方案,包括预防、止损、降级、容量管理、弹性部署、故障分析、流量分配、性能调优等方案 -参与金融联合建模产品各类在线服务和各类模型产品部署运行架构设计,主导服务可靠性相关自动化系统的实现,满足严格的质量与效率要求 -利用百度已有基础设施和开源技术设计和实施产品监控系统、容灾策略和灾难恢复预案,响应和处理生产环境中的紧急事件,最小化服务中断 -关注业界前沿技术动态,负责大规模机器学习模型在线预测系统优化,演进和新接入技术探索和应用
更新于 2024-10-29