
长江存储AI平台架构师(J14573)
社招全职5年以上地点:武汉状态:招聘
任职要求
任职要求 (Qualifications) 教育背景:计算机、通信、电子工程等相关专业本科及以上学历。 经验要求: 1、5 年以上云计算、基础架构或 HPC 相关开发经验。 2、必须具备大型 GPU 集群(100+ 卡规模)的运维、调度或AI平台建设经验。 3、精通 Kubernetes 架构,熟悉 Docker/Containerd。 4、熟悉 NVIDIA GP…
登录查看完整任职要求
微信扫码,1秒登录
工作职责
核心岗位职责 (Key Responsibilities) 1、负责基于 Kubernetes 的 AI 算力调度平台建设,实现 GPU/NPU 等异构算力的统一纳管、池化与弹性调度。 2、大规模分布式训练基础设施:构建支持千卡/万卡集群的稳定环境,优化高性能网络。 3、云原生 MLOps 体系构建:打造基于云原生的模型开发、训练、部署全流程平台:实现算力资源的计量计费、多租户隔离、配额管理及成本优化(FinOps)。 4、负责智算集群的监控告警、故障自愈及性能调优;保障任务的高可用性(SLA)。
包括英文材料
学历+
HPC+
https://www.ibm.com/think/topics/hpc
HPC is a technology that uses clusters of powerful processors that work in parallel to process massive, multidimensional data sets and solve complex problems at extremely high speeds.
Kubernetes+
https://kubernetes.io/docs/tutorials/kubernetes-basics/
This tutorial provides a walkthrough of the basics of the Kubernetes cluster orchestration system.
https://kubernetes.io/zh-cn/docs/tutorials/kubernetes-basics/
本教程介绍 Kubernetes 集群编排系统的基础知识。每个模块包含关于 Kubernetes 主要特性和概念的一些背景信息,还包括一个在线教程供你学习。
https://www.youtube.com/watch?v=s_o8dwzRlu4
Hands-On Kubernetes Tutorial | Learn Kubernetes in 1 Hour - Kubernetes Course for Beginners
https://www.youtube.com/watch?v=X48VuDVv0do
Full Kubernetes Tutorial | Kubernetes Course | Hands-on course with a lot of demos
Docker+
https://www.youtube.com/watch?v=GFgJkfScVNU
Master Docker in one course; learn about images and containers on Docker Hub, running multiple containers with Docker Compose, automating workflows with Docker Compose Watch, and much more. 🐳
https://www.youtube.com/watch?v=kTp5xUtcalw
Learn how to use Docker and Kubernetes in this complete hand-on course for beginners.
containerd+
https://github.com/containerd/containerd/blob/main/docs/getting-started.md
containerd has built-in support for Kubernetes Container Runtime Interface (CRI).
https://www.youtube.com/watch?v=cr1062-s8x4
On this talk, you are going to learn about one of the most important technologies used in the container and Kubernetes space.
https://www.youtube.com/watch?v=u1LeMndEk70
In this video we talk about three key technologies that enable Kuberntes.
还有更多 •••