阿里云阿里云智能-云原生k8s SRE平台研发工程师/专家-杭州
社招全职3年以上云智能集团地点:杭州状态:招聘
任职要求
1. 计算机相关专业,3年及以上后端开发或云原生平台开发经验,具备良好的问题排查能力、系统设计能力、和稳定性风险意识; 2. 熟练掌握Java和Go语言,深入理解 Kubernetes 架构与核心组件(如 API Server、etcd、kubelet、kube-proxy 等),熟悉 Helm、Istio、Prometheus、Fluentd 等生态工具; 3. 熟…
登录查看完整任职要求
微信扫码,1秒登录
工作职责
1. 负责设计、开发和维护基于 Kubernetes 的自动化运维管理平台,提升对资源成本的控制、保障业务稳定性、提高运维效率; 2. 熟练使用Go/Java语言开发平台服务及底层Kubernetes组件能力; 3. 参与平台的高可用、性能优化、安全加固及自动化运维体系建设; 4. 基于AI技术,智能化解决容器层面的问题诊断、成本治理、告警降噪等问题; 5. 编写高质量、可维护的技术文档,推动团队技术沉淀与标准化。
包括英文材料
后端开发+
https://www.youtube.com/watch?v=tN6oJu2DqCM&list=PLWKjhJtqVAbn21gs5UnLhCQ82f923WCgM
Learn what technologies you should learn first to become a back end web developer.
系统设计+
https://roadmap.sh/system-design
Everything you need to know about designing large scale systems.
https://www.youtube.com/watch?v=F2FmTdLtb_4
This complete system design tutorial covers scalability, reliability, data handling, and high-level architecture with clear explanations, real-world examples, and practical strategies.
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
Go+
https://www.youtube.com/watch?v=8uiZC0l4Ajw
学习Golang的完整教程!从开始到结束不到一个小时,包括如何在Go中构建API的完整演示。没有多余的内容,只有你需要知道的知识。
Kubernetes+
https://kubernetes.io/docs/tutorials/kubernetes-basics/
This tutorial provides a walkthrough of the basics of the Kubernetes cluster orchestration system.
https://kubernetes.io/zh-cn/docs/tutorials/kubernetes-basics/
本教程介绍 Kubernetes 集群编排系统的基础知识。每个模块包含关于 Kubernetes 主要特性和概念的一些背景信息,还包括一个在线教程供你学习。
https://www.youtube.com/watch?v=s_o8dwzRlu4
Hands-On Kubernetes Tutorial | Learn Kubernetes in 1 Hour - Kubernetes Course for Beginners
https://www.youtube.com/watch?v=X48VuDVv0do
Full Kubernetes Tutorial | Kubernetes Course | Hands-on course with a lot of demos
Helm+
[英文] Introduction to Helm
https://helm.sh/docs/intro/
Are you new to Helm? This is the place to start!
https://www.baeldung.com/ops/kubernetes-helm
In this tutorial, we’ll understand the basics of Helm and how they form a powerful tool for working with Kubernetes resources.
Istio+
https://istio.io/latest/docs/examples/microservices-istio/
This modular tutorial provides new users with hands-on experience using Istio for common microservices scenarios, one step at a time.
https://www.freecodecamp.org/news/learn-istio-manage-microservices/
In a world without Istio, one service makes direct requests to another and in case of failures, the service is responsible for handling those.
Prometheus+
https://grafana.com/docs/grafana/latest/getting-started/get-started-grafana-prometheus/
Prometheus is an open source monitoring system for which Grafana provides out-of-the-box support.
https://prometheus.io/docs/tutorials/getting_started/
Prometheus is a system monitoring and alerting system.
Fluentd+
https://docs.fluentd.org/
Fluentd is an open-source data collector for a unified logging layer.
[英文] Guides and Recipes
https://www.fluentd.org/guides
Here is a growing collection of Fluentd resources, solution guides and recipes.
https://www.youtube.com/watch?v=Gp0-7oVOtPw
In todays episode, we take a look at the basics of Fluentd.
还有更多 •••
相关职位
社招A81609
1、负责火山引擎云原生容器平台产品的稳定性保障,通过平台建设/架构优化/组织提升等手段,不断提升云产品系统稳定性; 2、负责容器平台和大规模容器集群的稳定性保障,完成可靠性分析与优化;深入分析业务架构和系统运行时,持续识别稳定性薄弱环节,负责技术难点的攻坚,提升系统核心链路的整体稳定性; 3、参与火山引擎云原生容器平台产品的运维管控平台规划建设,设计实现相关自动化运维、分析诊断和保障体系,打造面向多地域超大规模集群的自动化运维和管控体系。
更新于 2025-06-10杭州
社招A98480A
1、负责火山引擎云原生容器平台产品的稳定性保障,通过平台建设/架构优化/组织提升等手段,不断提升云产品系统稳定性; 2、负责容器平台和大规模容器集群的稳定性保障,完成可靠性分析与优化;深入分析业务架构和系统运行时,持续识别稳定性薄弱环节,负责技术难点的攻坚,提升系统核心链路的整体稳定性; 3、参与火山引擎云原生容器平台产品的运维管控平台规划建设,设计实现相关自动化运维、分析诊断和保障体系,打造面向多地域超大规模集群的自动化运维和管控体系。
更新于 2025-06-10北京
社招A48924
1、负责火山引擎云原生容器平台产品的稳定性保障,通过平台建设/架构优化/组织提升等手段,不断提升云产品系统稳定性; 2、负责容器平台和大规模容器集群的稳定性保障,完成可靠性分析与优化;深入分析业务架构和系统运行时,持续识别稳定性薄弱环节,负责技术难点的攻坚,提升系统核心链路的整体稳定性; 3、参与火山引擎云原生容器平台产品的运维管控平台规划建设,设计实现相关自动化运维、分析诊断和保障体系,打造面向多地域超大规模集群的自动化运维和管控体系。
更新于 2025-06-10上海