快手大数据运维开发工程师/专家
社招全职3年以上D7209地点:北京状态:招聘
任职要求
1、大学本科及以上学历,计算机或者相关专业,3年以上经验均可; 2、 熟悉Hadoop生态圈各组件基本原理以及使用(包括但不限于Hdfs、Yarn、Hbase、Kafka、Hive、Clickhouse); 3、具备扎实的编程能力,掌握至少一种脚本语言(Shell、Perl、Python等),熟悉Java等开发语言者优先,熟悉常用算法和数据结构; 4、Linux操作系统基础扎实,对操作系统原理有一定了解; 5、具有良好的抗压能力,较强的故障分析排查能力,有很好的技术敏感度和风险识别能力。 符合以下条件优先: 1、有大规模大数据服务集群(包括但不限于Hdfs、Yarn、Hbase、Kafka、Hive、Clickhouse)维护经验,对运维体系建设有自己的见解; 2、有Aiops开发经验,了解常用算法。
工作职责
1、负责公司数万节点大数据集群的各项运维管理工作,保障集群服务的高可用性运行; 2、负责超大规模集群服务运维管理平台的设计与研发工作,保障集群服务版本高速迭代以及变更的风险控制; 3、负责集群服务的监控报警体系规划与产品研发迭代,推进监控报警有效性与智能化; 4、负责集群服务容量规划、服务管理与治理规划与产品研发迭代工作。
包括英文材料
学历+
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
HDFS+
https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware.
https://www.ibm.com/cn-zh/think/topics/hdfs
Hadoop 分布式文件系统 (HDFS) 是一种管理大型数据集的文件系统,可在商用硬件上运行。
Yarn+
[英文] Introduction
https://yarnpkg.com/getting-started
Yarn is an established open-source package manager used to manage dependencies in JavaScript projects.
HBase+
[英文] HBase Tutorial
https://www.tutorialspoint.com/hbase/index.htm
HBase is a data model that is similar to Google's big table designed to provide quick random access to huge amounts of structured data. This tutorial provides an introduction to HBase, the procedures to set up HBase on Hadoop File Systems, and ways to interact with HBase shell.
Kafka+
https://developer.confluent.io/what-is-apache-kafka/
https://www.youtube.com/watch?v=CU44hKLMg7k
https://www.youtube.com/watch?v=j4bqyAMMb7o&list=PLa7VYi0yPIH0KbnJQcMv5N9iW8HkZHztH
In this Apache Kafka fundamentals course, we introduce you to the basic Apache Kafka elements and APIs, as well as the broader Kafka ecosystem.
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
ClickHouse+
[英文] Advanced Tutorial
https://clickhouse.com/docs/tutorial
Learn how to ingest and query data in ClickHouse using the New York City taxi example dataset.
https://www.youtube.com/watch?v=FtoWGT7kS-c
ClickHouse is an open-source column-oriented DBMS for online analytical processing that allows users to generate analytical reports using SQL queries in real-time.
https://www.youtube.com/watch?v=Rhe-kUyrFUE&list=PL0Z2YDlm0b3gcY5R_MUo4fT5bPqUQ66ep
脚本+
[英文] Scripting language
https://en.wikipedia.org/wiki/Scripting_language
https://zhuanlan.zhihu.com/p/571097954
一个脚本通常是解释执行而非编译。脚本语言通常都有简单、易学、易用的特性,目的就是希望能让程序员快速完成程序的编写工作。
Bash+
[英文] The Bash Guide
https://guide.bash.academy/
A quality-driven guide through the shell's many features.
https://www.youtube.com/watch?v=tK9Oc6AEnR4
Understanding how to use bash scripting will enhance your productivity by automating tasks, streamlining processes, and making your workflow more efficient.
Perl+
https://www.perl.org/learn.html
Useful links if you are interested in learning Perl
https://www.runoob.com/perl/perl-tutorial.html
本教程适合想从零开始学习 Perl 编程语言的开发人员。当然本教程也会对一些模块进行深入,让你更好的了解 Perl 的应用。
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
算法+
https://roadmap.sh/datastructures-and-algorithms
Step by step guide to learn Data Structures and Algorithms in 2025
https://www.hellointerview.com/learn/code
A visual guide to the most important patterns and approaches for the coding interview.
https://www.w3schools.com/dsa/
数据结构+
https://www.youtube.com/watch?v=8hly31xKli0
In this course you will learn about algorithms and data structures, two of the fundamental topics in computer science.
https://www.youtube.com/watch?v=B31LgI4Y4DQ
Learn about data structures in this comprehensive course. We will be implementing these data structures in C or C++.
https://www.youtube.com/watch?v=CBYHwZcbD-s
Data Structures and Algorithms full course tutorial java
Linux+
https://ryanstutorials.net/linuxtutorial/
Ok, so you want to learn how to use the Bash command line interface (terminal) on Unix/Linux.
https://ubuntu.com/tutorials/command-line-for-beginners
The Linux command line is a text interface to your computer.
https://www.youtube.com/watch?v=6WatcfENsOU
In this Linux crash course, you will learn the fundamental skills and tools you need to become a proficient Linux system administrator.
https://www.youtube.com/watch?v=v392lEyM29A
Never fear the command line again, make it fear you.
https://www.youtube.com/watch?v=ZtqBQ68cfJc
相关职位
社招3年以上D7209
1、参与快手大数据运维产品建设,包括大数据计算引擎运维平台与业务控制台开发落地,保障引擎运维效率以及提升业务使用计算引擎易用性; 2、接受大数据平台系统设计与实现复杂度的挑战,分析和发现系统的优化点,负责推动系统的合理性、可靠性、可用性的提升; 3、为团队引入创新的技术、创新的解决方案,用创新的思路解决问题。
更新于 2025-03-07
社招5年以上腾讯云技术
1.负责腾讯云大数据基础运维和客户问题解决,基于腾讯云提供的EMR、Elasticsearch、TCHouse产品,解决客户在产品使用过程中遇到的问题,为客户业务提供最佳服务体验; 2.负责报障大数据产品服务稳定性,包括全局数智化监控、服务架构容灾、容量管理等基础运维能力建设,保障大数据服务SLA; 3.负责运维标准流程规范制定,建设大数据产品运维标准、大数据产品规范化变更流程和大数据组件可观测性标准等; 4.参与智能化运维AIOps,对标互联网SRE业界优秀经验,基于自研运维平台,实现智能化运维,提升运维效率。
更新于 2025-08-05