
地平线【地瓜机器人】运维工程师
社招全职软件序列地点:北京状态:招聘
任职要求
【任职要求】: - 本科及以上学历,计算机相关专业优先; - 熟悉 Linux 操作系统,具备基本的系统排查与诊断能力; - 熟悉 Kubernetes 基本原理和常用操作,掌握常见组件(如 Deployment, Service, ConfigMap 等)的使用; - 能使用 kubectl、Helm 进行集群资源管理; - 掌握 Shell 脚本编写,有 Python 或 Golang 基础者优先; - 熟悉 Prometheus、Grafana、Loki 等监控日志工具的使用方法; - 了解阿里云、AWS 等云平台常用服务(ECS, SLB, S3, VPC); - 有责任感,具备较强的执行力和团队协作意识。 【加分项】:(具备以下之一者优先考虑) - 有 K8s 集群从 0 到 1 搭建或维护经验; - 使用过 Argo、Volcano、Kubeflow 等 AI 平台组件; - 熟悉 Terraform / Ansible / Helm / Kustomize 等工具; - 有开发简单运维工具或脚本的经验。 【职位亮点】: - 接触真实的 Kubernetes 生产集群,提升实战经验; - 有机会参与 AI 平台(Argo / Volcano)等先进技术的运维支持; - 团队氛围开放,技术栈现代,支持培养全栈 DevOps 能力; - 提供系统化培训与成长路径,助力向中高级 SRE / DevOps 方向发展。
工作职责
【岗位职责】: - 参与 Kubernetes 集群的日常维护与管理,包括部署、扩容、升级与故障处理; - 配合开发团队进行平台资源的调度支持,保障业务系统稳定运行; - 运维相关流程与规范的落地实施,执行平台日常变更操作; - 参与监控、日志、告警等系统的配置和使用,支持问题定位; - 协助使用云平台(如阿里云、AWS)核心服务,完成资源配置与变更; - 对已有自动化工具进行使用和简单脚本改进(Shell/Python)。
包括英文材料
学历+
Linux+
https://ryanstutorials.net/linuxtutorial/
Ok, so you want to learn how to use the Bash command line interface (terminal) on Unix/Linux.
https://ubuntu.com/tutorials/command-line-for-beginners
The Linux command line is a text interface to your computer.
https://www.youtube.com/watch?v=6WatcfENsOU
In this Linux crash course, you will learn the fundamental skills and tools you need to become a proficient Linux system administrator.
https://www.youtube.com/watch?v=v392lEyM29A
Never fear the command line again, make it fear you.
https://www.youtube.com/watch?v=ZtqBQ68cfJc
Kubernetes+
https://kubernetes.io/docs/tutorials/kubernetes-basics/
This tutorial provides a walkthrough of the basics of the Kubernetes cluster orchestration system.
https://kubernetes.io/zh-cn/docs/tutorials/kubernetes-basics/
本教程介绍 Kubernetes 集群编排系统的基础知识。每个模块包含关于 Kubernetes 主要特性和概念的一些背景信息,还包括一个在线教程供你学习。
https://www.youtube.com/watch?v=s_o8dwzRlu4
Hands-On Kubernetes Tutorial | Learn Kubernetes in 1 Hour - Kubernetes Course for Beginners
https://www.youtube.com/watch?v=X48VuDVv0do
Full Kubernetes Tutorial | Kubernetes Course | Hands-on course with a lot of demos
Helm+
[英文] Introduction to Helm
https://helm.sh/docs/intro/
Are you new to Helm? This is the place to start!
https://www.baeldung.com/ops/kubernetes-helm
In this tutorial, we’ll understand the basics of Helm and how they form a powerful tool for working with Kubernetes resources.
Bash+
[英文] The Bash Guide
https://guide.bash.academy/
A quality-driven guide through the shell's many features.
https://www.youtube.com/watch?v=tK9Oc6AEnR4
Understanding how to use bash scripting will enhance your productivity by automating tasks, streamlining processes, and making your workflow more efficient.
脚本+
[英文] Scripting language
https://en.wikipedia.org/wiki/Scripting_language
https://zhuanlan.zhihu.com/p/571097954
一个脚本通常是解释执行而非编译。脚本语言通常都有简单、易学、易用的特性,目的就是希望能让程序员快速完成程序的编写工作。
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
Go+
https://www.youtube.com/watch?v=8uiZC0l4Ajw
学习Golang的完整教程!从开始到结束不到一个小时,包括如何在Go中构建API的完整演示。没有多余的内容,只有你需要知道的知识。
Prometheus+
https://grafana.com/docs/grafana/latest/getting-started/get-started-grafana-prometheus/
Prometheus is an open source monitoring system for which Grafana provides out-of-the-box support.
https://prometheus.io/docs/tutorials/getting_started/
Prometheus is a system monitoring and alerting system.
Grafana+
AWS+
https://aws.amazon.com/
Amazon Web Services offers reliable, scalable, and inexpensive cloud computing services. Free to join, pay only for what you use.
S3+
https://aws.amazon.com/s3/getting-started/
You can use Amazon S3 to store and retrieve any amount of data at any time, from anywhere.
https://www.youtube.com/watch?v=tfU0JEZjcsg
Amazon S3 is the oldest and one of the most popular services on AWS.
Argo+
https://argo-cd.readthedocs.io/en/stable/understand_the_basics/
Before effectively using Argo CD, it is necessary to understand the underlying technology that the platform is built on.
https://www.youtube.com/watch?v=MeU5_k9ssrs
The ArgoCD chapter includes building a pipeline of dynamically updating & building a new application version using GitLab downstream pipeline feature.
Volcano+
[英文] Tutorials
https://volcano.sh/en/docs/tutorials/
This section provides guidance to help you quickly get started with Volcano, from deploying a basic Volcano Job/Deployment, to integrating with Volcano Queues
Kubeflow+
https://huggingface.co/blog/turhancan97/building-your-first-kubeflow-pipeline
Kubeflow is an open-source platform designed to be end-to-end, facilitating each step of the Machine Learning (ML) workflow.
https://www.kubeflow.org/docs/started/introduction/
Kubeflow is the foundation of tools for AI Platforms on Kubernetes.
https://www.youtube.com/watch?v=6wWdNg0GMV4
In this walk-through I will show you how I've created a machine learning pipeline with Kubeflow 1.5 using Juypter Notebooks, Kubeflow pipelines, MinIO and Kserve.
Terraform+
https://developer.hashicorp.com/terraform/tutorials
Build, change, and destroy infrastructure with Terraform. Start here to learn the basics of Terraform with your favorite cloud provider.
https://www.youtube.com/watch?v=_45W3Z8XWL4
In this video you will learn the basics of using Terraform.
Ansible+
https://docs.ansible.com/ansible/latest/getting_started/index.html
Ansible automates the management of remote systems and controls their desired state.
DevOps+
https://roadmap.sh/devops
Step by step guide for DevOps, SRE or any other Operations Role in 2025
https://zhuanlan.zhihu.com/p/562036793
DevOps中的Dev指的是Development(开发),Ops指的是Operations(运维),用一句话来说,DevOps就是打通开发运维的壁垒,实现开发运维一体化。
相关职位

社招5-10年业务拓展序列
1、负责机器人芯片解决方案业务部在华北区域的客户管理与销售工作 2、制定客户策略,维护客户关系,赢得客户订单,为产品线业务增长负责 3、深度理解并牵引地平线芯片与算法解决方案在客户场景中落地 4、管理与推动代理商渠道,进行客户拓展与项目落地 5、配合市场团队在泛机器人领域进行业务拓展
更新于 2025-06-13

社招1-3年软件序列
1、负责 AI 芯片的系统软件开发,包括 OS 内核、BSP、中间件和相关平台工具链等开发 2、负责 AI 芯片的图像、视频、BPU(NPU)、显示等多媒体中间件和 Framework 的开发 3、负责芯片流片前的子系统/模块软件设计、开发和验证及芯片回来后点亮和功能调试等
更新于 2024-04-10