影石数据仓库工程师
社招全职3-7年地点:深圳状态:招聘
任职要求
一、基础要求 1、本科及以上学历,计算机科学、数据科学、信息技术等相关专业。 2、3-7年及以上数据仓库设计与开发经验,有完整的企业级数据仓库建设项目经验优先。 3、具备强烈的责任心、严谨的逻辑思维、良好的沟通协调能力及团队协作精神,对数据敏感,重视数据质量。 二、 技术能力 1、精通数据仓库理论体系,深入理解分层架构、主题域建模、维度建模等方法论,能结合业务场景设计合理的数据模型。 2、精通SQL/HQL/Spark SQL开发及优化,能高效处理TB/PB级大规模数据集,具备复杂查询性能调优实战经验。 3、熟悉大数据生态技术栈,具备Hadoop、Hive、Spark、F…
登录查看完整任职要求
微信扫码,1秒登录
工作职责
1、负责企业级数据仓库及业务数据集市的架构规划、分层设计(ODS/DWD/DWS/ADS)与落地实施,确保架构的先进性、可扩展性与高性能。 2、主导核心业务域的数据建模工作,精通维度建模等方法论,设计通用灵活的数据模型与宽表,保障数据的一致性、完整性与复用性。 3、设计并实现高效的ETL/ELT数据集成方案,负责数据提取、转换、加载全流程的开发、调度与优化,解决大规模数据处理中的性能瓶颈。 4、推进数据治理体系建设,包括数据质量监控、元数据管理、数据血缘追踪、指标体系标准化等,提升整体数据质量与数据资产价值。 5、负责数据仓库的日常运维与故障排查,保障数据服务SLA达标,及时响应并解决数据延迟、数据错误等线上问题,确保数据链路稳定可靠。 6、深入理解业务需求,与数据分析、业务部门紧密协作,提供高质量的数据支持与解决方案,支撑业务决策、数据产品迭代及精细化运营。
包括英文材料
学历+
数据科学+
https://roadmap.sh/ai-data-scientist
Step by step roadmap guide to becoming an AI and Data Scientist
数据仓库+
https://www.youtube.com/watch?v=9GVqKuTVANE
From Zero to Data Warehouse Hero: A Full SQL Project Walkthrough and Real Industry Experience!
https://www.youtube.com/watch?v=k4tK2ttdSDg
SQL+
https://liaoxuefeng.com/books/sql/introduction/index.html
什么是SQL?简单地说,SQL就是访问和处理关系数据库的计算机标准语言。
https://sqlbolt.com/
Learn SQL with simple, interactive exercises.
https://www.youtube.com/watch?v=p3qvj9hO_Bo
In this video we will cover everything you need to know about SQL in only 60 minutes.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
性能调优+
https://goperf.dev/
The Go App Optimization Guide is a series of in-depth, technical articles for developers who want to get more performance out of their Go code without relying on guesswork or cargo cult patterns.
https://web.dev/learn/performance
This course is designed for those new to web performance, a vital aspect of the user experience.
https://www.ibm.com/think/insights/application-performance-optimization
Application performance is not just a simple concern for most organizations; it’s a critical factor in their business’s success.
https://www.oreilly.com/library/view/optimizing-java/9781492039259/
Performance tuning is an experimental science, but that doesn’t mean engineers should resort to guesswork and folklore to get the job done.
大数据+
https://www.youtube.com/watch?v=bAyrObl7TYE
https://www.youtube.com/watch?v=H4bf_uuMC-g
With all this talk of Big Data, we got Rebecca Tickle to explain just what makes data into Big Data.
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
Kafka+
https://developer.confluent.io/what-is-apache-kafka/
https://www.youtube.com/watch?v=CU44hKLMg7k
https://www.youtube.com/watch?v=j4bqyAMMb7o&list=PLa7VYi0yPIH0KbnJQcMv5N9iW8HkZHztH
In this Apache Kafka fundamentals course, we introduce you to the basic Apache Kafka elements and APIs, as well as the broader Kafka ecosystem.
还有更多 •••
相关职位
社招JVVJP
1、负责字节跳动旗下幸福里/今日头条/游戏业务/西瓜视频/搜索业务/小说/审核/搜索等一个或多个业务线的数据仓库架构设计、建模和ETL开发; 2、参与数据治理工作,提升数据易用性及数据质量,与数据工具团队紧密合作; 3、理解并合理抽象业务需求,发挥数据价值,与业务团队紧密合作。
更新于 2021-04-12北京
社招2年以上
1. 基于阿里云DataWorks平台,完成数据采集、清洗、加工、建模及报表开发,支撑复杂业务场景的数据需求; 2. 深入理解业务逻辑,与业务部门高效沟通,将模糊需求转化为清晰的数据开发方案; 3. 负责ETL任务开发与维护,确保数据流程的准确性、稳定性与时效性; 4. 管理数据血缘关系,维护数据文档(如数据字典、处理逻辑说明),保障数据可追溯性; 5. 分析数据异常问题,快速定位原因并推动解决,确保业务决策数据的可靠性; 6. 协助优化数据使用流程,提升业务方数据获取和分析效率。
更新于 2025-05-28深圳
社招5年以上旅游业务AI &
1.参与离线和实时数据仓库架构设计和开发,构建高效、稳定、可扩展的数据仓库系统。 2.负责数据仓库模型设计,包括星型模型、雪花模型、星座模型等,并制定数据仓库开发规范。 3.负责数据资产的运营,结合业务现状,解决数据资产生产、消费过程中的卡点,提升相关团队的用数体验 4.开发和维护 ETL/ELT 数据 pipeline,确保数据高效、准确地从源系统加载到数据仓库 5.优化数据仓库性能,解决数据查询和数据加载的性能瓶颈。
更新于 2025-03-06上海