米哈游GPU运维开发工程师
社招全职2年以上程序&技术类地点:上海状态:招聘
任职要求
1、本科及以上学历,计算机相关专业,具有2年以上中大型互联网基础设施运维经验; 2、精通CentOS/Ubuntu等Linux发行版管理,深入理解操作系统原理及内核调优策略; 3、熟悉x86服务器架构及GPU异构计算平台,具备硬件级故障诊断能力; 4、掌握TCP/IP协议栈及主流网络服务(Nginx/HAProxy/Keepalived)的部署调优; 5、熟练使用Shell/Python开发运维脚本,具备Ansib…
登录查看完整任职要求
微信扫码,1秒登录
工作职责
1、负责Linux服务器集群(生产/开发测试环境)的全生命周期管理,包括部署实施、性能调优、故障诊断及容灾方案设计; 2、负责数据中心硬件资源的规划与运维,涵盖主流厂商x86架构服务器、GPU计算节点的上架部署、压力测试、健康监控及故障处置; 3、管理混合云基础设施,包括OpenStack私有云及AWS/Aliyun等公有云IaaS资源,确保云环境高可用性及安全合规; 4、参与自动化运维体系建设,基于Ansible/SaltStack构建CI/CD流水线,开发运维效率工具链,推动DevOps实践落地; 5、参与系统架构优化设计,开展虚拟化(KVM/Docker)及容器编排(K8s)技术的落地实施与日常维护。
包括英文材料
学历+
CentOS+
https://www.freecodecamp.org/news/getting-started-with-centos-15eac7215c99/
CentOS or Community Enterprise OS is an open source distribution based on RHEL or Red Hat Enterprise Linux.
https://www.youtube.com/watch?v=Mi6GUcSW5xs
I'll cover everything you need to know to get up and running with CentOS 8.
Ubuntu+
[英文] Tutorials
https://ubuntu.com/tutorials
These tutorials provide a step-by-step process to doing development and dev-ops activities on Ubuntu machines, servers or devices.
https://www.youtube.com/watch?v=D4WyNjt_hbQ
This tutorial is intended for those of you that are looking for a resource for helping you get started using Ubuntu on your laptop or desktop.
Linux+
https://ryanstutorials.net/linuxtutorial/
Ok, so you want to learn how to use the Bash command line interface (terminal) on Unix/Linux.
https://ubuntu.com/tutorials/command-line-for-beginners
The Linux command line is a text interface to your computer.
https://www.youtube.com/watch?v=6WatcfENsOU
In this Linux crash course, you will learn the fundamental skills and tools you need to become a proficient Linux system administrator.
https://www.youtube.com/watch?v=v392lEyM29A
Never fear the command line again, make it fear you.
https://www.youtube.com/watch?v=ZtqBQ68cfJc
内核+
https://www.youtube.com/watch?v=C43VxGZ_ugU
I rummage around the Linux kernel source and try to understand what makes computers do what they do.
https://www.youtube.com/watch?v=HNIg3TXfdX8&list=PLrGN1Qi7t67V-9uXzj4VSQCffntfvn42v
Learn how to develop your very own kernel from scratch in this programming series!
https://www.youtube.com/watch?v=JDfo2Lc7iLU
Denshi goes over a simple explanation of what computer kernels are and how they work, alonside what makes the Linux kernel any special.
TCP/IP+
[英文] What is TCP/IP?
https://www.techtarget.com/searchnetworking/definition/TCP-IP
TCP/IP stands for Transmission Control Protocol/Internet Protocol and is a suite of communication protocols used to interconnect network devices on the internet.
Nginx+
[英文] Beginner’s Guide
https://nginx.org/en/docs/beginners_guide.html
This guide gives a basic introduction to nginx and describes some simple tasks that can be done with it.
https://www.youtube.com/watch?v=9t9Mp0BGnyI
NGINX is open-source web server software used for reverse proxy, load balancing, and caching. It's important to understand, especially if you are a backend developer.
Bash+
[英文] The Bash Guide
https://guide.bash.academy/
A quality-driven guide through the shell's many features.
https://www.youtube.com/watch?v=tK9Oc6AEnR4
Understanding how to use bash scripting will enhance your productivity by automating tasks, streamlining processes, and making your workflow more efficient.
还有更多 •••
相关职位
社招A172760
1、负责维护语音相关服务系统的稳定,线上问题紧急干预处理,网络接入和机房拓扑优化等; 2、负责服务资源的管理与规划,包括GPU/CPU机器资源,以及其他存储和计算队列资源等。
更新于 2025-05-27杭州
社招A162282
1、负责维护语音相关服务系统的稳定,线上问题紧急干预处理,网络接入和机房拓扑优化等; 2、负责服务资源的管理与规划,包括GPU/CPU机器资源,以及其他存储和计算队列资源等。
更新于 2025-05-27深圳
社招F8BP
1、负责维护语音相关服务系统的稳定,线上问题紧急干预处理,网络接入和机房拓扑优化等; 2、负责服务资源的管理与规划,包括GPU/CPU机器资源,以及其他存储和计算队列资源等。
更新于 2021-06-24上海
社招A66864
团队介绍:Data AML是字节跳动公司的机器学习中台,为抖音/今日头条/西瓜视频等业务提供推荐/广告/CV/语音/NLP的训练和推理系统。为公司内业务部门提供强大的机器学习算力,并在这些业务的问题上研究一些具有通用性和创新性的算法。同时,也通过火山引擎将一些机器学习/推荐系统的核心能力提供给外部企业客户。此外,AML还在AI for Science,科学计算等领域做一些前沿研究。 1、保障机器学习系统的稳定运转; 2、负责核心服务的持续集成和交付,高效和自动化的运维优化,提升服务的稳定性; 3、负责分布式系统的监控与指标建设; 4、负责在离线集群的云平台化、资源优化、SLA保障。
更新于 2024-06-14北京