字节跳动SRE运维开发工程师-Data语音
社招全职F8BP地点:上海状态:招聘
任职要求
1、具备丰富的运维开发项目经验; 2、熟练掌握Linux环境下的Go/Python/Shell等1至2种以上语言; 3、对网络协议和相关服务有较深入理解,如TCP/IP、DNS、NAT、负载均衡等; 4、有丰富的系统运维经验,对常见的系统隐患、系统故障有系统性总结和实际处理经验; 5、能够承受较高的工作压力,有强烈的工作责任心,具备较强的问题分析、排查、解决和沟通协调能力; 6、有良好的工作文档习惯,及时按要求撰写更新工作流程及技术文档。 加分项: 1、熟悉互联网通用的组件,对消息中间件、分布式缓存、数据库有较深入的理解; 2、有GPU服务器的运维经验; 3、从事过大规模分布式系统的资源管理和任务调度系统运维经验,熟悉YARN、Kubernetes、Mesos等开源技术。
工作职责
1、负责维护语音相关服务系统的稳定,线上问题紧急干预处理,网络接入和机房拓扑优化等; 2、负责服务资源的管理与规划,包括GPU/CPU机器资源,以及其他存储和计算队列资源等。
包括英文材料
Linux+
https://ryanstutorials.net/linuxtutorial/
Ok, so you want to learn how to use the Bash command line interface (terminal) on Unix/Linux.
https://ubuntu.com/tutorials/command-line-for-beginners
The Linux command line is a text interface to your computer.
https://www.youtube.com/watch?v=6WatcfENsOU
In this Linux crash course, you will learn the fundamental skills and tools you need to become a proficient Linux system administrator.
https://www.youtube.com/watch?v=v392lEyM29A
Never fear the command line again, make it fear you.
https://www.youtube.com/watch?v=ZtqBQ68cfJc
Go+
https://www.youtube.com/watch?v=8uiZC0l4Ajw
学习Golang的完整教程!从开始到结束不到一个小时,包括如何在Go中构建API的完整演示。没有多余的内容,只有你需要知道的知识。
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
Bash+
[英文] The Bash Guide
https://guide.bash.academy/
A quality-driven guide through the shell's many features.
https://www.youtube.com/watch?v=tK9Oc6AEnR4
Understanding how to use bash scripting will enhance your productivity by automating tasks, streamlining processes, and making your workflow more efficient.
TCP/IP+
[英文] What is TCP/IP?
https://www.techtarget.com/searchnetworking/definition/TCP-IP
TCP/IP stands for Transmission Control Protocol/Internet Protocol and is a suite of communication protocols used to interconnect network devices on the internet.
中间件+
https://www.youtube.com/watch?v=1oWPUpMheGk
缓存+
https://hackernoon.com/the-system-design-cheat-sheet-cache
The cache is a layer that stores a subset of data, typically the most frequently accessed or essential information, in a location quicker to access than its primary storage location.
https://www.youtube.com/watch?v=bP4BeUjNkXc
Caching strategies, Distributed Caching, Eviction Policies, Write-Through Cache and Least Recently Used (LRU) cache are all important terms when it comes to designing an efficient system with a caching layer.
https://www.youtube.com/watch?v=dGAgxozNWFE
分布式系统+
https://www.distributedsystemscourse.com/
The home page of a free online class in distributed systems.
https://www.youtube.com/watch?v=7VbL89mKK3M&list=PLOE1GTZ5ouRPbpTnrZ3Wqjamfwn_Q5Y9A
Kubernetes+
https://kubernetes.io/docs/tutorials/kubernetes-basics/
This tutorial provides a walkthrough of the basics of the Kubernetes cluster orchestration system.
https://kubernetes.io/zh-cn/docs/tutorials/kubernetes-basics/
本教程介绍 Kubernetes 集群编排系统的基础知识。每个模块包含关于 Kubernetes 主要特性和概念的一些背景信息,还包括一个在线教程供你学习。
https://www.youtube.com/watch?v=s_o8dwzRlu4
Hands-On Kubernetes Tutorial | Learn Kubernetes in 1 Hour - Kubernetes Course for Beginners
https://www.youtube.com/watch?v=X48VuDVv0do
Full Kubernetes Tutorial | Kubernetes Course | Hands-on course with a lot of demos
Mesos+
https://www.baeldung.com/apache-mesos
Apache Mesos is a platform that allows effective resource sharing between such applications.
https://www.oreilly.com/library/view/learn-apache-mesos/9781789137385/
Learn Apache Mesos is the go-to book for anyone eager to master the power of efficient resource management and cluster deployment with Apache Mesos.
相关职位
社招A172760
1、负责维护语音相关服务系统的稳定,线上问题紧急干预处理,网络接入和机房拓扑优化等; 2、负责服务资源的管理与规划,包括GPU/CPU机器资源,以及其他存储和计算队列资源等。
更新于 2025-05-27
社招A162282
1、负责维护语音相关服务系统的稳定,线上问题紧急干预处理,网络接入和机房拓扑优化等; 2、负责服务资源的管理与规划,包括GPU/CPU机器资源,以及其他存储和计算队列资源等。
更新于 2025-05-27
社招3年以上A59704
1、推进优化基础服务的响应延迟、性能问题,提升服务稳定性; 2、负责各种基础系统搭建和维护(DNS、LDAP等); 3、开发自动化运维平台,提高运维、开发协作效率,规范操作流程; 4、优化各种系统,减少重复性工作; 5、负责公司基础监控、报警系统开发与维护。
更新于 2025-03-22