
竞技世界数据开发工程师-2026
校招全职北京市地点:北京状态:招聘
任职要求
1、熟悉数据仓库的概念、架构和设计原则,具备数据建模和维度建模的经验 2、了解数据质量管理的方法和工具,能够监控和维护数据仓库中的数据质量 3、熟悉Hadoop生态圈各组件基本原理以及使用(包括但不限于Hdfs、Yarn、Hbase、K…
登录查看完整任职要求
微信扫码,1秒登录
工作职责
1、负责公司大数据集成平台规划设计及开发迭代,打造高效易用的大数据平台接入能力 2、负责数据仓库性能监控和质量监控,保证数据仓库的稳定性和高效性 3、与业务部门合作,理解业务需求,设计和提供相应的数据解决方案
包括英文材料
数据仓库+
https://www.youtube.com/watch?v=9GVqKuTVANE
From Zero to Data Warehouse Hero: A Full SQL Project Walkthrough and Real Industry Experience!
https://www.youtube.com/watch?v=k4tK2ttdSDg
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
HDFS+
https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware.
https://www.ibm.com/cn-zh/think/topics/hdfs
Hadoop 分布式文件系统 (HDFS) 是一种管理大型数据集的文件系统,可在商用硬件上运行。
Yarn+
[英文] Introduction
https://yarnpkg.com/getting-started
Yarn is an established open-source package manager used to manage dependencies in JavaScript projects.
HBase+
[英文] HBase Tutorial
https://www.tutorialspoint.com/hbase/index.htm
HBase is a data model that is similar to Google's big table designed to provide quick random access to huge amounts of structured data. This tutorial provides an introduction to HBase, the procedures to set up HBase on Hadoop File Systems, and ways to interact with HBase shell.
Kafka+
https://developer.confluent.io/what-is-apache-kafka/
https://www.youtube.com/watch?v=CU44hKLMg7k
https://www.youtube.com/watch?v=j4bqyAMMb7o&list=PLa7VYi0yPIH0KbnJQcMv5N9iW8HkZHztH
In this Apache Kafka fundamentals course, we introduce you to the basic Apache Kafka elements and APIs, as well as the broader Kafka ecosystem.
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
ClickHouse+
[英文] Advanced Tutorial
https://clickhouse.com/docs/tutorial
Learn how to ingest and query data in ClickHouse using the New York City taxi example dataset.
https://www.youtube.com/watch?v=FtoWGT7kS-c
ClickHouse is an open-source column-oriented DBMS for online analytical processing that allows users to generate analytical reports using SQL queries in real-time.
https://www.youtube.com/watch?v=Rhe-kUyrFUE&list=PL0Z2YDlm0b3gcY5R_MUo4fT5bPqUQ66ep
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
还有更多 •••
相关职位

校招技术
团队介绍: 我们专注于构建高质量高价值的数据资产,助力公司在数据驱动的决策、运营、增长、营销和创新中不断优化。加入我们,你将接触前沿大数据技术,深度参与业务数据项目,实现个人与公司的共同成长。 1、负责数据源的采集、清洗、转换及集成,确保数据的完整性和准确性。参与多种业务数据的建模和数据内容的开发与迭代; 2、利用ETL工具和编程语言(如SQL、Python、spark、flink)进行大数据的处理和开发,编写高效的数据处理逻辑,确保数据的高效传输与存储; 3、根据业务需求,参与设计和优化公司数据仓库的整体架构,包括事实表、维度表的搭建,保证数据仓库的扩展性、稳定性和性能; 4、定期监控数据的质量,参与数据的核查与清理,确保数据仓库中的数据可靠、准确,能够支撑公司多样化的业务分析需求; 5、 与产品、运营、BI团队密切协作,理解并转化业务需求,设计合理的数据模型、数据挖掘模型和数据内容产品。
更新于 2025-08-26上海
实习集团
1、构建分布式大数据服务平台,参与和构建公司包括海量数据存储、离线/实时计算、实时查询、BI等系统; 2、搭建贝壳集团统一的数据仓库,参与海量数据的存储、查询和运营数据分析体系搭建; 3、负责日常需求开发,实现高效的数据运营,服务日益增长的业务和数据量。
更新于 2025-04-03杭州

校招技术
1.构建通用特征数据管理、算法模型服务和在线策略引擎能力,支持搜索、推荐、CV/NLP等业务高效迭代; 2.参与离线/实时计算、任务调度、资源优化、分布式训练、推理加速等平台能力建设的工程实践及效能优化;
更新于 2025-08-26上海
