阿里云阿里云智能-专有云平台智能升级研发专家-专有云(北京/杭州)
社招全职5年以上云智能集团地点:北京 | 杭州状态:招聘
任职要求
1,计算机、人工智能或相关专业本科及以上学历,具备扎实的编程基础,精通 Python,Jave 或 Go,熟悉异步编程与高并发服务开发。 2,具备智算/云产品架构经验:熟悉智算中心(AIDC)基础设施、异构算力(GPU/NPU)调度及云原生技术(Docker/K8s)。对云产品的全生命周期管理、版本演进及大规模集群的平滑升级有深刻的理解和实战经验。 3,AI Agent与智能化工程能力:熟练掌握LangChain、AutoGen或阿里云百炼等主流AI编排框架;深入理解Agent(智能体)的核心架构(如ReAct、Plan-and-Execute),具备将AI能力深度融入云产品控制面(Control Plane…
登录查看完整任职要求
微信扫码,1秒登录
工作职责
1,平台升级智能化:专有云平台“热升级数字人(智能体)”的研发与落地。结合可观测数据,建立AI智能体的SLI/SLO/SLA健康管理体系,通过巡检、诊断、预案与自愈等自动化手段,持续提升升级质量与自动化自主化水平,切实降低升级成本。 2,智算云产品升级体系设计:负责专有云及智算基础设施(GPU/NPU集群、超节点服务器等)的整体升级模式与架构体系设计。推动云产品从传统的“资源交付”向“任务式交付”与“AI Native智能化”转型,构建适应Agentic时代的高可用、高弹性升级底座。 3,AI Native全链路升级可观测体系建设:构建面向大模型与AI Agent的统一升级可观测平台(Metrics、Log、Event、Trace)。突破传统监控瓶颈,实现从底层算力资源、云平台组件到上层AI智能体(Agent)决策链路的端到端可观测,性能分析,精准定位升级过程中的性能瓶颈与异常根因。 4,前沿技术探索与架构演进:保持对AI领域(如多智能体协作、Deep Research、Agentic Cloud等)及云原生技术前沿的高度敏感。负责将业界新的技术理念、论文成果或开源项目转化为可落地的技术方案,持续推动团队技术栈的迭代与架构的长期演进。
包括英文材料
学历+
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
Go+
https://www.youtube.com/watch?v=8uiZC0l4Ajw
学习Golang的完整教程!从开始到结束不到一个小时,包括如何在Go中构建API的完整演示。没有多余的内容,只有你需要知道的知识。
高并发+
https://www.baeldung.com/concurrency-principles-patterns
In this tutorial, we’ll discuss some of the design principles and patterns that have been established over time to build highly concurrent applications.
https://www.baeldung.com/java-concurrency
Handling concurrency in an application can be a tricky process with many potential pitfalls. A solid grasp of the fundamentals will go a long way to help minimize these issues.
https://www.oreilly.com/library/view/concurrency-in-go/9781491941294/
You’ll understand how Go chooses to model concurrency, what issues arise from this model, and how you can compose primitives within this model to solve problems.
https://www.oreilly.com/library/view/modern-concurrency-in/9781098165406/
With this book, you'll explore the transformative world of Java 21's key feature: virtual threads.
https://www.youtube.com/watch?v=qyM8Pi1KiiM
https://www.youtube.com/watch?v=wEsPL50Uiyo
Docker+
https://www.youtube.com/watch?v=GFgJkfScVNU
Master Docker in one course; learn about images and containers on Docker Hub, running multiple containers with Docker Compose, automating workflows with Docker Compose Watch, and much more. 🐳
https://www.youtube.com/watch?v=kTp5xUtcalw
Learn how to use Docker and Kubernetes in this complete hand-on course for beginners.
Kubernetes+
https://kubernetes.io/docs/tutorials/kubernetes-basics/
This tutorial provides a walkthrough of the basics of the Kubernetes cluster orchestration system.
https://kubernetes.io/zh-cn/docs/tutorials/kubernetes-basics/
本教程介绍 Kubernetes 集群编排系统的基础知识。每个模块包含关于 Kubernetes 主要特性和概念的一些背景信息,还包括一个在线教程供你学习。
https://www.youtube.com/watch?v=s_o8dwzRlu4
Hands-On Kubernetes Tutorial | Learn Kubernetes in 1 Hour - Kubernetes Course for Beginners
https://www.youtube.com/watch?v=X48VuDVv0do
Full Kubernetes Tutorial | Kubernetes Course | Hands-on course with a lot of demos
AI agent+
https://www.ibm.com/think/ai-agents
Your one-stop resource for gaining in-depth knowledge and hands-on applications of AI agents.
还有更多 •••