贝壳大数据工程(OLAP引擎方向)(J66019)
社招全职4年以上平台工具研发部地点:北京状态:招聘
任职要求
1、计算机相关专业本科以上学历,4年以上相关工作经验; 2、精通C++ / Java / Scala 程序开发(至少一种),具有扎实的专业基础,较强的编码能力,熟悉常见的数据结构与算法; 3、精通大数据框架中的一个或多个,并深入了解其原理:Hadoop/Hive/Spark/Flink/Kylin/Kafka/Druid/ClickHouse/Doiris等,熟悉源码优先; 4、精通数据库原理,熟悉主流的 OLAP 引擎的优化原理,向量化执行、Filter 下推、物化视图、列式存储的优先; 5、具有较强的问题解决能力和技术钻研能力; 6、良好的表达沟通能力和团队协作能力,善于系统性思考;
工作职责
1、负责公司OLAP平台建设,支撑公司所有业务线的多维数据分析需求; 2、负责StarRocks/Druid/Presto/Doris等OLAP 引擎的内核级优化与定制; 3、负责面向大规模数据问题,对海量数据提供秒级查询的解决方案;
包括英文材料
学历+
C+++
https://www.learncpp.com/
LearnCpp.com is a free website devoted to teaching you how to program in modern C++.
https://www.youtube.com/watch?v=ZzaPdXTrSb8
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
Scala+
数据结构+
https://www.youtube.com/watch?v=8hly31xKli0
In this course you will learn about algorithms and data structures, two of the fundamental topics in computer science.
https://www.youtube.com/watch?v=B31LgI4Y4DQ
Learn about data structures in this comprehensive course. We will be implementing these data structures in C or C++.
https://www.youtube.com/watch?v=CBYHwZcbD-s
Data Structures and Algorithms full course tutorial java
算法+
https://roadmap.sh/datastructures-and-algorithms
Step by step guide to learn Data Structures and Algorithms in 2025
https://www.hellointerview.com/learn/code
A visual guide to the most important patterns and approaches for the coding interview.
https://www.w3schools.com/dsa/
大数据+
https://www.youtube.com/watch?v=bAyrObl7TYE
https://www.youtube.com/watch?v=H4bf_uuMC-g
With all this talk of Big Data, we got Rebecca Tickle to explain just what makes data into Big Data.
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
Kafka+
https://developer.confluent.io/what-is-apache-kafka/
https://www.youtube.com/watch?v=CU44hKLMg7k
https://www.youtube.com/watch?v=j4bqyAMMb7o&list=PLa7VYi0yPIH0KbnJQcMv5N9iW8HkZHztH
In this Apache Kafka fundamentals course, we introduce you to the basic Apache Kafka elements and APIs, as well as the broader Kafka ecosystem.
ClickHouse+
[英文] Advanced Tutorial
https://clickhouse.com/docs/tutorial
Learn how to ingest and query data in ClickHouse using the New York City taxi example dataset.
https://www.youtube.com/watch?v=FtoWGT7kS-c
ClickHouse is an open-source column-oriented DBMS for online analytical processing that allows users to generate analytical reports using SQL queries in real-time.
https://www.youtube.com/watch?v=Rhe-kUyrFUE&list=PL0Z2YDlm0b3gcY5R_MUo4fT5bPqUQ66ep
OLAP+
https://www.youtube.com/watch?v=iw-5kFzIdgY
OLAP (for online analytical processing) is software for performing multidimensional analysis at high speeds on large volumes of data from a data warehouse, data mart, or some other unified, centralized data store.
相关职位
社招2年以上CSIG技术
1.负责元宝业务的离线和实时数仓规划和建设,结合数据、技术与应用等多方特性,构建高可用、易扩展的数仓体系,高效满足业务用数诉求; 2.负责建立数仓与数据质量标准和规范,确定数据治理方案,并与内外部团队协作,推动落地实施,不断提升数据质量,确保数据及时、准确与稳定性; 3.不断优化数仓模型,抽象总结并沉淀通用方案与平台工具能力,提升研发与用户用数效率。
更新于 2025-08-01
社招2年以上财务平台
1、负责境外财务数据仓库的架构、设计、智能化应用探索, 包括但不限于在线多维分析、数据分析与决策、预测算法模型、数据存储、数据服务、数据质量、数据产品等能力设计与研发等,并结合大模型进行数据检索及分析新形态落地; 2、负责数据产品建设,为财务日常运营、财务管理等提供稳定的数据支持; 3、负责数据服务建设,提供稳定、可靠的数据服务,支撑业务系统迭代;
更新于 2025-06-22
社招A133877
1、以抖音系全域数据为依托,参与构建海量数据下符合安全生态业务特性的数据仓库建设; 2、负责数据模型的架构设计、开发以及海量数据下的性能调优,复杂业务场景下的需求交付; 3、参与构建围绕安全、质量、效率、成本等方向的数据管理能力建设,并推动某细分横向场景的落地; 4、深入业务,理解并合理抽象业务需求,发挥数据价值,与业务团队紧密合作。
更新于 2024-08-01