盒马数据研发工程师
实习兼职盒马2026届实习生招聘地点:杭州状态:招聘
任职要求
1、计算机及相关专业毕业,本科及以上学历 2、掌握一门或多门编程语言优先,如Java、Python、Perl等,熟悉Linux系统,熟悉数据库,了解算法模型、网络、数据结构等计算机知识 3、熟悉数据工厂、维度建模、数据湖等数据仓库相关理论,有实际数据仓库项目、算法建模、ETL开发经验,有数据仓库分层架构设定经验者优先 4、熟悉Hadoop生态相关技术,如Hive、HBase、Spark、Flink、Storm、Elasticsearch、Impala、Druid、Kylin等,有基于分布式数据存储与计算平台应用开发经验者优先 5、有良好的业务Sense,对数据业务场景非常敏感,能够横向协同,跨界整合资源,有效结合业务和技术创新,形成完整的数据解决方案,全局地规划或完善数据服务体系以解决业务/产品的问题 6、热爱大数据,性格沉稳,有较好的语言表达能力,能自我驱动,有强烈的求知欲与进取心,有团队合作精神,敢于挑战,能在压力下成长 7、熟悉AI、LLM等相关技术,有过部署、训练、调优、应用等相关经验优先
工作职责
1、负责盒马数据仓库搭建,建设包括交易、流量、营销、采配、库存、仓储、配送、履约、财务 等业务领域的通用数据集市 2、负责数据全链路的开发,包括日志埋点、内部与外部数据的采集、数据同步、数据清洗与标准化、数据模型设计、离线数据处理、实时数据处理、数据服务化、数据可视化等 3、参与数据治理工作,包括元数据管理、数据质量检查、数据分级管理等系统的设计、开发及应用,提升数据易用性、可用性及稳定性 4、参与用户CRM、流量分发、供应商绩效、库存健康、动态定价、智能排班等产品的规划,并保证其落地 5、参与盒马数据化运营,在深入了解盒马业务的基础上,制定系统性端到端的数据解决方案,通过数据+算法驱动业务优化,打造新零售应用标杆
包括英文材料
学历+
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
Perl+
https://www.perl.org/learn.html
Useful links if you are interested in learning Perl
https://www.runoob.com/perl/perl-tutorial.html
本教程适合想从零开始学习 Perl 编程语言的开发人员。当然本教程也会对一些模块进行深入,让你更好的了解 Perl 的应用。
Linux+
https://ryanstutorials.net/linuxtutorial/
Ok, so you want to learn how to use the Bash command line interface (terminal) on Unix/Linux.
https://ubuntu.com/tutorials/command-line-for-beginners
The Linux command line is a text interface to your computer.
https://www.youtube.com/watch?v=6WatcfENsOU
In this Linux crash course, you will learn the fundamental skills and tools you need to become a proficient Linux system administrator.
https://www.youtube.com/watch?v=v392lEyM29A
Never fear the command line again, make it fear you.
https://www.youtube.com/watch?v=ZtqBQ68cfJc
算法+
https://roadmap.sh/datastructures-and-algorithms
Step by step guide to learn Data Structures and Algorithms in 2025
https://www.hellointerview.com/learn/code
A visual guide to the most important patterns and approaches for the coding interview.
https://www.w3schools.com/dsa/
数据结构+
https://www.youtube.com/watch?v=8hly31xKli0
In this course you will learn about algorithms and data structures, two of the fundamental topics in computer science.
https://www.youtube.com/watch?v=B31LgI4Y4DQ
Learn about data structures in this comprehensive course. We will be implementing these data structures in C or C++.
https://www.youtube.com/watch?v=CBYHwZcbD-s
Data Structures and Algorithms full course tutorial java
数据仓库+
https://www.youtube.com/watch?v=9GVqKuTVANE
From Zero to Data Warehouse Hero: A Full SQL Project Walkthrough and Real Industry Experience!
https://www.youtube.com/watch?v=k4tK2ttdSDg
ETL+
https://www.ibm.com/think/topics/etl
ETL—meaning extract, transform, load—is a data integration process that combines, cleans and organizes data from multiple sources into a single, consistent data set for storage in a data warehouse, data lake or other target system.
https://www.youtube.com/watch?v=OW5OgsLpDCQ
It explains what ETL is and what it can do for you to improve your data analysis and productivity.
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
HBase+
[英文] HBase Tutorial
https://www.tutorialspoint.com/hbase/index.htm
HBase is a data model that is similar to Google's big table designed to provide quick random access to huge amounts of structured data. This tutorial provides an introduction to HBase, the procedures to set up HBase on Hadoop File Systems, and ways to interact with HBase shell.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
Apache Storm+
[英文] Tutorial
https://storm.apache.org/releases/2.6.0/Tutorial.html
In this tutorial, you'll learn how to create Storm topologies and deploy them to a Storm cluster.
https://www.baeldung.com/apache-storm
This tutorial will be an introduction to Apache Storm, a distributed real-time computation system.
ElasticSearch+
https://www.youtube.com/watch?v=a4HBKEda_F8
Learn about Elasticsearch with this comprehensive course designed for beginners, featuring both theoretical concepts and hands-on applications using Python (though applicable to any programming language). The course is structured in two parts: first covering essential Elasticsearch fundamentals including index management, document storage, text analysis, pipeline creation, search functionality, and advanced features like semantic search and embeddings; followed by a practical section where you'll build a real-world website using Elasticsearch as a search engine, working with the Astronomy Picture of the Day (APOD) dataset to implement features such as data cleaning pipelines, tokenization, pagination, and aggregations.
Impala+
[英文] Impala Tutorials
https://impala.apache.org/docs/build/html/topics/impala_tutorial.html
This section includes tutorial scenarios that demonstrate how to begin using Impala.
大数据+
https://www.youtube.com/watch?v=bAyrObl7TYE
https://www.youtube.com/watch?v=H4bf_uuMC-g
With all this talk of Big Data, we got Rebecca Tickle to explain just what makes data into Big Data.
大模型+
https://www.youtube.com/watch?v=xZDB1naRUlk
You will build projects with LLMs that will enable you to create dynamic interfaces, interact with vast amounts of text data, and even empower LLMs with the capability to browse the internet for research papers.
https://www.youtube.com/watch?v=zjkBMFhNj_g
相关职位
社招3年以上A92288
1.负责小米互联网电视和视频等业务数据仓库架构设计、标准化埋点、数据建模和ETL开发; 2.参与数据治理工作,提升数据易用性及数据质量,与数据平台团队紧密合作; 3.理解并合理抽象业务需求,解决服务的业务问题,与业务团队紧密合作; 4.跟踪业界先进的数据相关技术栈和解决方案。
更新于 2025-02-06
实习阿里云2026届
阿里云持续推进AI 技术深化战略布局, 围绕AI 和云计算的基础设施建设、AI基础模型平台、企业级AI应用方向构建核心场景。为此,我们正积极招募优秀人才: 如果你想参与阿里云大数据的采集、存储、处理,通过分布式大数据平台加工数据,支持业务管理决策; 如果你想参与阿里云大数据体系的模型设计、开发、维护,通过元数据、质量体系有效的管理和组织EB级的数据; 如果你想参与阿里云大数据产品的研发,发挥你的商业sense,通过数据分析和算法来洞察数据背后的机会,来探索大数据商业化; 如果你想接触世界领先的大数据处理与应用的技术和平台,获得大数据浪潮之巅的各类大牛的指导; 那就加入我们吧!
更新于 2025-06-17
实习菜鸟集团2026
1、负责菜鸟集团大数据的采集、存储、处理,通过分布式大数据平台加工数据,支持业务管理决策; 2、参与菜鸟集团大数据体系的模型设计、开发、维护,通过元数据、质量体系有效的管理和组织EB级的数据; 3、参与菜鸟集团大数据产品的研发,通过数据分析和算法洞察数据背后的商业机会点,探索大数据商业化。
更新于 2025-06-24