
哈啰SRE / 存储工程师
社招全职软研类地点:上海 | 北京状态:招聘
任职要求
精通 Linux 系统、Shell/Python/Go 脚本,熟悉集群管理和运维工具。 • 熟悉 Kubernetes、容器化部署、网络和存储管理。 • 有大型 HPC/AI 超算集群或云原生平台…
登录查看完整任职要求
微信扫码,1秒登录
工作职责
• 负责万卡超算集群、多云、多集群环境的高可用性、稳定性和性能保障。 • 构建和优化监控、告警、日志、追踪、容量规划及自动化运维体系。 • 支撑训练、推理和资产管理平台的端到端可靠性和性能优化。 • 接触前沿技术:Prometheus/Grafana、Loki、K8s Operator、自动化运维、云原生平台。
包括英文材料
Linux+
https://ryanstutorials.net/linuxtutorial/
Ok, so you want to learn how to use the Bash command line interface (terminal) on Unix/Linux.
https://ubuntu.com/tutorials/command-line-for-beginners
The Linux command line is a text interface to your computer.
https://www.youtube.com/watch?v=6WatcfENsOU
In this Linux crash course, you will learn the fundamental skills and tools you need to become a proficient Linux system administrator.
https://www.youtube.com/watch?v=v392lEyM29A
Never fear the command line again, make it fear you.
https://www.youtube.com/watch?v=ZtqBQ68cfJc
内核+
https://www.youtube.com/watch?v=C43VxGZ_ugU
I rummage around the Linux kernel source and try to understand what makes computers do what they do.
https://www.youtube.com/watch?v=HNIg3TXfdX8&list=PLrGN1Qi7t67V-9uXzj4VSQCffntfvn42v
Learn how to develop your very own kernel from scratch in this programming series!
https://www.youtube.com/watch?v=JDfo2Lc7iLU
Denshi goes over a simple explanation of what computer kernels are and how they work, alonside what makes the Linux kernel any special.
Terraform+
https://developer.hashicorp.com/terraform/tutorials
Build, change, and destroy infrastructure with Terraform. Start here to learn the basics of Terraform with your favorite cloud provider.
https://www.youtube.com/watch?v=_45W3Z8XWL4
In this video you will learn the basics of using Terraform.
Ansible+
https://docs.ansible.com/ansible/latest/getting_started/index.html
Ansible automates the management of remote systems and controls their desired state.
性能调优+
https://goperf.dev/
The Go App Optimization Guide is a series of in-depth, technical articles for developers who want to get more performance out of their Go code without relying on guesswork or cargo cult patterns.
https://web.dev/learn/performance
This course is designed for those new to web performance, a vital aspect of the user experience.
https://www.ibm.com/think/insights/application-performance-optimization
Application performance is not just a simple concern for most organizations; it’s a critical factor in their business’s success.
https://www.oreilly.com/library/view/optimizing-java/9781492039259/
Performance tuning is an experimental science, but that doesn’t mean engineers should resort to guesswork and folklore to get the job done.
Bash+
[英文] The Bash Guide
https://guide.bash.academy/
A quality-driven guide through the shell's many features.
https://www.youtube.com/watch?v=tK9Oc6AEnR4
Understanding how to use bash scripting will enhance your productivity by automating tasks, streamlining processes, and making your workflow more efficient.
还有更多 •••
相关职位
社招8年以上技术类-开发
1、承担MaxCompute管控系统架构师角色,负责产品技术架构演进方向 2、面向全球不同客户设计合理的产品方案,梳理存储、计算、售卖、控制台、运维体系架构,确定技术方案选型 3、承担MaxCompute管控系统设计、研发、测试、发布与运维 4、与MaxCompute各研发团队+SRE中台团队+阿里云售卖平台配合,共同推进技术项目按要求落地
更新于 2025-04-02杭州
社招2年以上诚云科技
1、云产品稳定性保障,风险巡检 客户云产品稳定性、体验相关事项治理,产品风险巡检,故障的应急跟进与处理 2、客户技术专项处置与支持 复杂、疑难问题/技术方案/活动护航保障/产研共建专项主导与管理工作 3、排查问题,管控体验 高效排查解决产品技术售后问题,在服务过程中关注客户体验提升、有效管控客情 4、专精客户行业,技术沉淀 提炼客户行业技术服务方案,沉淀内部技术文档,持续提高公共云/混合云各行业最佳实践能力
更新于 2025-10-10西安|北京|杭州
社招3年以上诚云科技
1、云产品稳定性保障,风险巡检 客户云产品稳定性、体验相关事项治理,产品风险巡检,故障的应急跟进与处理 2、客户技术专项处置与支持 复杂、疑难问题/技术方案/活动护航保障/产研共建专项主导与管理工作 3、排查问题,管控体验 高效排查解决产品技术售后问题,在服务过程中关注客户体验提升、有效管控客情 4、专精客户行业,技术沉淀 提炼客户行业技术服务方案,沉淀内部技术文档,持续提高公共云/混合云各行业最佳实践能力
更新于 2025-11-26西安|北京|杭州