盒马盒马D2-数据研发工程师-武汉
社招全职5年以上技术类-数据地点:武汉状态:招聘
任职要求
职位要求: 1. 计算机及相关专业毕业,本科及以上学历,1年以上工作经历; 2. 熟悉数据工厂、维度建模、数据湖等数据仓库相关理论,实际参与过2个以上的数据仓库项目,有实际模型设计及ETL开发经验;有数据仓库分层架构设定经验者优先;有零售行业数据模型设计经验者优先; 3. 熟悉Hadoop生态相关技术,如Hive、HBase、Spark、Flink、Storm、Elasticsearch、Impala、Druid、Kylin等,有基于分布式数据存储与计算平台应用开发经验,熟悉阿里云大数据平台(如MaxCompute、Blink、DataWorks、Dataphin等)者优先; 4. 掌握一门或多门编程语言优先,如Java、Python、Perl等,熟…
登录查看完整任职要求
微信扫码,1秒登录
工作职责
职位描述: 1. 负责盒马数据仓库搭建,建设包括交易、流量、营销、采配、库存、仓储、配送、履约、财务 等业务领域的通用数据集市; 2. 负责数据全链路的开发,包括日志埋点、内部与外部数据的采集、数据同步、数据清洗与标准化、数据模型设计、离线数据处理、实时数据处理、数据服务化、数据可视化等; 3. 参与数据治理工作,包括元数据管理、数据质量检查、数据分级管理等系统的设计、开发及应用,提升数据易用性、可用性及稳定性; 4. 参与用户CRM、流量分发、供应商绩效、库存健康、动态定价、智能排班等产品的规划,并保证其落地; 5. 参与盒马数据化运营,在深入了解盒马业务的基础上,制定系统性端到端的数据解决方案,通过数据+算法驱动业务优化,打造新零售应用标杆。
包括英文材料
学历+
数据仓库+
https://www.youtube.com/watch?v=9GVqKuTVANE
From Zero to Data Warehouse Hero: A Full SQL Project Walkthrough and Real Industry Experience!
https://www.youtube.com/watch?v=k4tK2ttdSDg
ETL+
https://www.ibm.com/think/topics/etl
ETL—meaning extract, transform, load—is a data integration process that combines, cleans and organizes data from multiple sources into a single, consistent data set for storage in a data warehouse, data lake or other target system.
https://www.youtube.com/watch?v=OW5OgsLpDCQ
It explains what ETL is and what it can do for you to improve your data analysis and productivity.
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
HBase+
[英文] HBase Tutorial
https://www.tutorialspoint.com/hbase/index.htm
HBase is a data model that is similar to Google's big table designed to provide quick random access to huge amounts of structured data. This tutorial provides an introduction to HBase, the procedures to set up HBase on Hadoop File Systems, and ways to interact with HBase shell.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
Apache Storm+
[英文] Tutorial
https://storm.apache.org/releases/2.6.0/Tutorial.html
In this tutorial, you'll learn how to create Storm topologies and deploy them to a Storm cluster.
https://www.baeldung.com/apache-storm
This tutorial will be an introduction to Apache Storm, a distributed real-time computation system.
还有更多 •••
相关职位
实习
1. 负责自动驾驶训练数据的生产与开发,确保数据集的质量和多样性 2. 参与自动驾驶数据分析,识别潜在问题,提出优化方案,并根据分析结果改进数据生产流程 3. 协助开发与优化数据生产相关AI模型,如LLM、Prompt、Agent应用 4. 与运营、算法及上下游团队紧密合作,确保数据需求和交付的高效衔接,推动项目的顺利进行
更新于 2025-11-28武汉
社招2年以上
工作描述: 1、参与知乎教育相关系统设计及研发工作; 2、协助完成需求收集、分析,系统设计,测试和部署,编写研发设计文档; 3、独立完成核心模块的技术攻关以及开发工作;
更新于 2024-03-06武汉
实习
职位描述 - 负责小米大数据平台的研发工作,面向湖仓一体场景建设,Data + AI 数据处理能力建设,以及分布式引擎的最佳实践落地 - 支撑集团各业务团队的数据开发能力,提供一站式数据平台解决方案,支持 PB 级数据处理能力 - 持续提升数据平台的能力、易用性和质量
更新于 2025-07-04武汉