平头哥devops工程师
校招全职平头哥秋季2026届应届生招聘地点:上海状态:招聘
任职要求
我们希望你具备: • 计算机相关专业本科及以上学历,热爱技术,具备良好的编程基础(熟悉Python/Go/Shell等语言优先); • 对Linux系统、容器技术(Docker)、编排系统(Kubernetes)有基本了解; • 对人工智能、机器学习有浓厚兴趣,了解基础算法或有相关项目经验者(如异常检测、日志分析、自动化决策等)将被重点考虑; • 具备良好的沟通能力、学习能力与团队协作精神,渴望在AI与系统工程的交叉领域深耕发展。 加分项: • 了解MLOps、AIOps相关理念与工具链(如MLflow、Kubeflow等),参与过开源项目; • 在课程设计、竞赛或科研中应用AI解决实际工程问题的经历。 我们提供: • 深度参与AI与DevOps融合创新的机会,接触前沿技术实践; • 资深工程师一对一指导,快速成长的技术氛围; • 开放、协作、鼓励创新的团队文化,为你的职业发展保驾护航。 加入我们,用AI重新定义DevOps,让系统更智能,让研发更高效!
工作职责
我们正在寻找充满激情、具备技术前瞻性的应届毕业生,加入我们致力于构建智能化研发基础设施的团队。作为AI赋能的DevOps开发工程师,你将参与下一代智能化CI/CD平台与自动化运维系统的开发与优化,融合人工智能技术,推动软件研发流程的自动化、可观测性与自愈能力全面提升。 你将参与: 1. 构建智能CI/CD流水线:与研发团队紧密协作,设计并开发高可用、可扩展的持续集成与持续交付平台;探索将机器学习应用于构建失败预测、测试用例智能推荐、资源调度优化等场景,提升研发效率与交付质量。 2. 开发智能化自动化运维工具:使用Python、Go等语言开发自动化脚本与工具,实现基础设施即代码(IaC);结合AI技术,探索日志异常检测、自动化根因分析等AIOps能力,实现系统运维的智能决策与响应。 3. 打造智能监控与自愈系统:参与构建覆盖全链路的监控体系,集成Prometheus、Grafana、ELK等技术栈;引入时序预测模型与异常检测算法(如LSTM、Isolation Forest等),实现性能瓶颈预警、故障自动诊断与部分场景的自愈响应。 4. 推动DevOps与MLOps融合实践:参与机器学习模型的训练流水线(ML Pipeline)与模型部署(Model Serving)基础设施建设,探索模型版本管理、A/B测试、监控与回滚机制,助力AI能力高效落地。
包括英文材料
学历+
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
Go+
https://www.youtube.com/watch?v=8uiZC0l4Ajw
学习Golang的完整教程!从开始到结束不到一个小时,包括如何在Go中构建API的完整演示。没有多余的内容,只有你需要知道的知识。
Bash+
[英文] The Bash Guide
https://guide.bash.academy/
A quality-driven guide through the shell's many features.
https://www.youtube.com/watch?v=tK9Oc6AEnR4
Understanding how to use bash scripting will enhance your productivity by automating tasks, streamlining processes, and making your workflow more efficient.
Linux+
https://ryanstutorials.net/linuxtutorial/
Ok, so you want to learn how to use the Bash command line interface (terminal) on Unix/Linux.
https://ubuntu.com/tutorials/command-line-for-beginners
The Linux command line is a text interface to your computer.
https://www.youtube.com/watch?v=6WatcfENsOU
In this Linux crash course, you will learn the fundamental skills and tools you need to become a proficient Linux system administrator.
https://www.youtube.com/watch?v=v392lEyM29A
Never fear the command line again, make it fear you.
https://www.youtube.com/watch?v=ZtqBQ68cfJc
Docker+
https://www.youtube.com/watch?v=GFgJkfScVNU
Master Docker in one course; learn about images and containers on Docker Hub, running multiple containers with Docker Compose, automating workflows with Docker Compose Watch, and much more. 🐳
https://www.youtube.com/watch?v=kTp5xUtcalw
Learn how to use Docker and Kubernetes in this complete hand-on course for beginners.
Kubernetes+
https://kubernetes.io/docs/tutorials/kubernetes-basics/
This tutorial provides a walkthrough of the basics of the Kubernetes cluster orchestration system.
https://kubernetes.io/zh-cn/docs/tutorials/kubernetes-basics/
本教程介绍 Kubernetes 集群编排系统的基础知识。每个模块包含关于 Kubernetes 主要特性和概念的一些背景信息,还包括一个在线教程供你学习。
https://www.youtube.com/watch?v=s_o8dwzRlu4
Hands-On Kubernetes Tutorial | Learn Kubernetes in 1 Hour - Kubernetes Course for Beginners
https://www.youtube.com/watch?v=X48VuDVv0do
Full Kubernetes Tutorial | Kubernetes Course | Hands-on course with a lot of demos
机器学习+
https://www.youtube.com/watch?v=0oyDqO8PjIg
Learn about machine learning and AI with this comprehensive 11-hour course from @LunarTech_ai.
https://www.youtube.com/watch?v=i_LwzRVP7bg
Learn Machine Learning in a way that is accessible to absolute beginners.
https://www.youtube.com/watch?v=NWONeJKn6kc
Learn the theory and practical application of machine learning concepts in this comprehensive course for beginners.
https://www.youtube.com/watch?v=PcbuKRNtCUc
Learn about all the most important concepts and terms related to machine learning and AI.
算法+
https://roadmap.sh/datastructures-and-algorithms
Step by step guide to learn Data Structures and Algorithms in 2025
https://www.hellointerview.com/learn/code
A visual guide to the most important patterns and approaches for the coding interview.
https://www.w3schools.com/dsa/
MLflow+
https://mlflow.org/docs/latest/ml/getting-started/
If you're new to MLflow or seeking a refresher on its core functionalities, the quickstart tutorials here are the perfect starting point.
https://mlflow.org/docs/latest/ml/tutorials-and-examples/
Here you'll find a curated set of resources to help you get started and deepen your knowledge of MLflow.
https://www.youtube.com/watch?v=cjeCAoW83_U
This is a video version of the MLFlow Quickstart guide.
https://www.youtube.com/watch?v=DnpEA1XaYlI
MLflow is designed to simplify the challenges of managing the machine learning lifecycle.
Kubeflow+
https://huggingface.co/blog/turhancan97/building-your-first-kubeflow-pipeline
Kubeflow is an open-source platform designed to be end-to-end, facilitating each step of the Machine Learning (ML) workflow.
https://www.kubeflow.org/docs/started/introduction/
Kubeflow is the foundation of tools for AI Platforms on Kubernetes.
https://www.youtube.com/watch?v=6wWdNg0GMV4
In this walk-through I will show you how I've created a machine learning pipeline with Kubeflow 1.5 using Juypter Notebooks, Kubeflow pipelines, MinIO and Kserve.
DevOps+
https://roadmap.sh/devops
Step by step guide for DevOps, SRE or any other Operations Role in 2025
https://zhuanlan.zhihu.com/p/562036793
DevOps中的Dev指的是Development(开发),Ops指的是Operations(运维),用一句话来说,DevOps就是打通开发运维的壁垒,实现开发运维一体化。
相关职位
社招1-7年SOFTWARE
AI智能体 AIGC方向 1. 负责 AIGC/LLM 在研发 DevOps 领域各场景的工程化、平台化落地工作; 2. 对现有系统的方案设计、性能瓶颈进行优化改进, 承担关键技术攻关; 3. 持续关注前沿技术,针对新的业务场景和挑战,能引入新的技术方案并落地实施。
更新于 2025-10-09

社招5年以上业务运维
1、应用 AI 工具进行团队协作,支持产品研发、部署与落地。 2、负责系统的持续集成(CI)与持续交付(CD)流程搭建与优化,持续提升交付效率和系统稳定性。 3、设计、实施并维护自动化运维体系,支持多云或混合云环境下的容器化部署(Docker/K8s),保障业务高可用与弹性扩展。 4、监控和提升系统的安全、性能与可观测性,响应并解决生产环境中的突发问题和故障。 5、持续探索并验证行业前沿的 DevOps 与 AI Ops 方案,推动最佳实践在团队落地。 6、主动与开发、产品、运营等多团队沟通,保障“需求-开发-上线-运维”全流程的顺畅衔接。
更新于 2025-10-17
校招
1. 协助搭建和维护硬件相关代码的自动化部署流程,基于 Jenkins 配置和管理 job,提升代码发布效率; 2. 参与硬件研发相关服务器、云资源的日常管理和简单配置; 3. 学习使用监控工具,配合团队及时发现和解决硬件研发流程中的系统问题; 4. 与硬件开发、测试团队协作,优化研发流程,保障硬件相关项目顺利推进。
更新于 2025-07-18