饿了么饿了么-数据研发专家-上海/杭州
社招全职5年以上技术类-数据地点:杭州 | 上海状态:招聘
任职要求
1、从事数据仓库领域3年以上,熟悉仓库模型设计与ETL开发经验,有O2O领域数据建设经验优先; 2、掌握Kimball的维度建模设计方法,具备海量数据加工处理(ETL)相关经验,灵活运用SQL实现海量数据ETL加工处理,具备良好的SQL性能调优能力; 3、熟悉Hadoop生态相关技术并有相关实践经验,重点考察Hdfs、Mapreduce、Hive、Hbase、Spark; 4、熟悉数据仓库领域知识和技能者优先,包括但不局限于:数据集市设计、元数据管理、数据质量、主数据管理; 5、掌握实时流计算技术,有Flink开发经验者优先; 6、具备良好的数据分析和理解能力,能够将业务和数据连接,发掘机会,链接业务,解决问题; 7、良好的语言沟通与表达能力、较强的自我驱动能力。
工作职责
1、参与到本地生活PB级数据仓库的建设,通过构建本地生活商家数据中台,服务于数百万本地生活商家,提供丰富稳定的数据化产品服务; 2、参与到本地生活用户增长,构建丰富的人群标签库、数据产品和服务,助力业务产品不断优化,支持用增外投、承接、机制等全链路的数据建设 3、能基于准确性、及时性、稳定性的要求不断提高数据中台的质量和服务。
包括英文材料
数据仓库+
https://www.youtube.com/watch?v=9GVqKuTVANE
From Zero to Data Warehouse Hero: A Full SQL Project Walkthrough and Real Industry Experience!
https://www.youtube.com/watch?v=k4tK2ttdSDg
ETL+
https://www.ibm.com/think/topics/etl
ETL—meaning extract, transform, load—is a data integration process that combines, cleans and organizes data from multiple sources into a single, consistent data set for storage in a data warehouse, data lake or other target system.
https://www.youtube.com/watch?v=OW5OgsLpDCQ
It explains what ETL is and what it can do for you to improve your data analysis and productivity.
SQL+
https://liaoxuefeng.com/books/sql/introduction/index.html
什么是SQL?简单地说,SQL就是访问和处理关系数据库的计算机标准语言。
https://sqlbolt.com/
Learn SQL with simple, interactive exercises.
https://www.youtube.com/watch?v=p3qvj9hO_Bo
In this video we will cover everything you need to know about SQL in only 60 minutes.
性能调优+
https://goperf.dev/
The Go App Optimization Guide is a series of in-depth, technical articles for developers who want to get more performance out of their Go code without relying on guesswork or cargo cult patterns.
https://web.dev/learn/performance
This course is designed for those new to web performance, a vital aspect of the user experience.
https://www.ibm.com/think/insights/application-performance-optimization
Application performance is not just a simple concern for most organizations; it’s a critical factor in their business’s success.
https://www.oreilly.com/library/view/optimizing-java/9781492039259/
Performance tuning is an experimental science, but that doesn’t mean engineers should resort to guesswork and folklore to get the job done.
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
HDFS+
https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware.
https://www.ibm.com/cn-zh/think/topics/hdfs
Hadoop 分布式文件系统 (HDFS) 是一种管理大型数据集的文件系统,可在商用硬件上运行。
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
HBase+
[英文] HBase Tutorial
https://www.tutorialspoint.com/hbase/index.htm
HBase is a data model that is similar to Google's big table designed to provide quick random access to huge amounts of structured data. This tutorial provides an introduction to HBase, the procedures to set up HBase on Hadoop File Systems, and ways to interact with HBase shell.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
数据分析+
[英文] Data Analyst Roadmap
https://roadmap.sh/data-analyst
Step by step guide to becoming an Data Analyst in 2025
MapReduce+
https://www.youtube.com/watch?v=bcjSe0xCHbE
https://www.youtube.com/watch?v=cHGaQz0E7AU
In this video I explain the basics of Map Reduce model, an important concept for any software engineer to be aware of.
相关职位
社招3年以上技术类-数据
1. 主要参与搜索推荐、用户增长、零售等业务数据开发; 2. 参与实时、离线数据链路治理,通过数据治理与质量优化,支持业务提效; 3. 基于对业务理解和产品诉求的抽象,参与到面向业务应用的流批一体数据湖仓架构设计和研发落地; 4. 深入理解电商平台的业务,通过过程性数据分析,持续定位挖掘潜在问题,助力业务发展;
更新于 2025-08-27
社招5年以上云智能集团
1、负责并行文件系统传输加密,构建并行文件系统的端到端加密框架,实现数据传输层的 TLS 加密; 2、负责并行文件系统的 NFS 协议研发,参与多机头 NFS v4.1 服务端协议状态机开发和维护; 3、负责并行文件系统的稳定性,保障加密模块与协议服务在高并发场景下的稳定性,通过压测、故障注入、端到端等测试验证加密对性能影响,并通过提升模块的可观测性、完善 SOP 设计与验证等方式,来实现系统的长期稳定运行。
更新于 2025-08-21
社招技术类-开发
1、负责应对各种复杂业务场景的分布式文件系统的设计与研发,包含高可用高可靠高性能设计,文件系统核心 IO 栈的研发,参与数据路径和元数据路径的设计和研发。 2、负责分布式文件系统的稳定性工程,包括但不限于系统的可观测性、FaultTolerance、多租户 QoS系统研发。针对专属云网络隔离、专线带宽受限等特定风险,负责针对性的稳定性设计、SOP 和 演练。
更新于 2025-06-18