
YY直播大数据平台开发工程师(J10509)
社招全职3年以上地点:广州状态:招聘
任职要求
任职要求: 1、计算机相关专业,大学本科学历,3年以上大数据开发相关工作经验; 2、熟悉大数据相关组件(如Hive、spark、hdfs、yarn、ambari等),具备编写、优化复杂SQL的能力; 3、拥有较强的hadoop集群运维响应意识; 4、熟悉Linux操作系统及命令,熟练掌握Shell编程开发; 5、熟练掌握java编程语言 Java,深入理解jvm。
工作职责
1、负责大数据平台开发工作,主要hadoop集群相关,能够解决业务的需求开发。 2、负责大数据平台的日常运维管理,保障数据平台的稳定性。 3、解决大数据平台的故障及性能问题,提升数据存储和计算效率。
包括英文材料
学历+
大数据+
https://www.youtube.com/watch?v=bAyrObl7TYE
https://www.youtube.com/watch?v=H4bf_uuMC-g
With all this talk of Big Data, we got Rebecca Tickle to explain just what makes data into Big Data.
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
HDFS+
https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware.
https://www.ibm.com/cn-zh/think/topics/hdfs
Hadoop 分布式文件系统 (HDFS) 是一种管理大型数据集的文件系统,可在商用硬件上运行。
Yarn+
[英文] Introduction
https://yarnpkg.com/getting-started
Yarn is an established open-source package manager used to manage dependencies in JavaScript projects.
SQL+
https://liaoxuefeng.com/books/sql/introduction/index.html
什么是SQL?简单地说,SQL就是访问和处理关系数据库的计算机标准语言。
https://sqlbolt.com/
Learn SQL with simple, interactive exercises.
https://www.youtube.com/watch?v=p3qvj9hO_Bo
In this video we will cover everything you need to know about SQL in only 60 minutes.
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
Linux+
https://ryanstutorials.net/linuxtutorial/
Ok, so you want to learn how to use the Bash command line interface (terminal) on Unix/Linux.
https://ubuntu.com/tutorials/command-line-for-beginners
The Linux command line is a text interface to your computer.
https://www.youtube.com/watch?v=6WatcfENsOU
In this Linux crash course, you will learn the fundamental skills and tools you need to become a proficient Linux system administrator.
https://www.youtube.com/watch?v=v392lEyM29A
Never fear the command line again, make it fear you.
https://www.youtube.com/watch?v=ZtqBQ68cfJc
Bash+
[英文] The Bash Guide
https://guide.bash.academy/
A quality-driven guide through the shell's many features.
https://www.youtube.com/watch?v=tK9Oc6AEnR4
Understanding how to use bash scripting will enhance your productivity by automating tasks, streamlining processes, and making your workflow more efficient.
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
JVM+
https://www.freecodecamp.org/news/jvm-tutorial-java-virtual-machine-architecture-explained-for-beginners/
https://www.youtube.com/watch?v=e2zmmkc5xI0
相关职位

社招3年以上
1、负责公司大数据平台及应用平台的设计、开发、环境搭建、调优及故障诊断; 2、负责公司大数据计算组件平台级支持服务,以及大数据计算组件的研发和性能优化工作; 3、跟进相关计算组件社区最新动态,在确保平台稳定运行的同时升级新特性。
更新于 2025-04-10

社招3年以上计算机网络技术类
1. 负责或参与大数据平台的架构设计以及基础组件的调优、改造、升级; 2. 结合客户需求,负责和参与大数据产品的设计与研发; 3. 负责数据平台新技术的研究及应用落地。
更新于 2024-10-16

社招
岗位职责: 1. 平台全生命周期管理:负责大数据平台的架构设计、核心模块研发与全链路维护。通过系统化监控、故障预警与应急响应机制,保障系统稳定运行。 2. 资源效能优化:深度分析平台资源使用状况,通过性能调优、成本控制与资源动态调度策略,实现集群资源利用率提升。同时推动数据治理体系建设,保障数据质量、安全性及合规性。 工作内容: 1. 平台迭代与稳定性保障:负责数据平台核心模块(如分布式调度系统、元数据资产管理、异构数据集成平台等)的持续迭代 2. AI 技术深度融合:参与算法平台与 AI 基础服务的研发,构建智能化数据处理流水线,提升业务研发效率。 3. 智能化数据治理:利用 NLP、大模型 等 AI 技术实现数据治理自动化,降低人工成本,提升数据价值。
更新于 2025-05-14