饿了么饿了么-数据研发专家-算法数据
社招全职3年以上技术类-数据地点:杭州 | 上海状态:招聘
任职要求
1. 计算机、数学、统计等相关专业本科及以上学历,3年以上互联网大数据研发相关工作经验,需有较强的逻辑思维能力和业务sense; 2. 需熟练掌握Flink实时技术,有丰富开发、排查、调优经验;熟练掌握 SQL、Java 语言,具有较强的编码能力; 3. 熟悉大数据生态体系,包括不限于 Hive、HDFS、Spark、ClickHouse、Paimon数据湖 等等,具备实际项目开发经验; 4. 具有丰富项目经历:包括但不限于 从0到1的数仓库架构设计项目、互联网PB级数据处理、数据治理经验、或工程平台开发经验; 5. 有算法数据支撑经验背景、推荐系统、广告系统、搜索系统等相关领域经验者优先; 6. base地:上海(优先)、杭州
工作职责
1. 主要参与搜索推荐、用户增长、零售等业务算法数据建设,通过实时、离线数据技术,支持算法样本、特征等开发工作; 2. 参与实时、离线数据链路治理,通过数据治理与质量优化,支持算法系统性能提升与成本优化; 3. 基于对业务理解和产品诉求的抽象,参与到面向业务应用的流批一体数据湖仓架构设计和研发落地; 4. 深入理解电商平台的算法超算业务,通过过程性数据分析,持续定位挖掘潜在问题,助力业务发展;
包括英文材料
学历+
大数据+
https://www.youtube.com/watch?v=bAyrObl7TYE
https://www.youtube.com/watch?v=H4bf_uuMC-g
With all this talk of Big Data, we got Rebecca Tickle to explain just what makes data into Big Data.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
SQL+
https://liaoxuefeng.com/books/sql/introduction/index.html
什么是SQL?简单地说,SQL就是访问和处理关系数据库的计算机标准语言。
https://sqlbolt.com/
Learn SQL with simple, interactive exercises.
https://www.youtube.com/watch?v=p3qvj9hO_Bo
In this video we will cover everything you need to know about SQL in only 60 minutes.
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
HDFS+
https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware.
https://www.ibm.com/cn-zh/think/topics/hdfs
Hadoop 分布式文件系统 (HDFS) 是一种管理大型数据集的文件系统,可在商用硬件上运行。
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
ClickHouse+
[英文] Advanced Tutorial
https://clickhouse.com/docs/tutorial
Learn how to ingest and query data in ClickHouse using the New York City taxi example dataset.
https://www.youtube.com/watch?v=FtoWGT7kS-c
ClickHouse is an open-source column-oriented DBMS for online analytical processing that allows users to generate analytical reports using SQL queries in real-time.
https://www.youtube.com/watch?v=Rhe-kUyrFUE&list=PL0Z2YDlm0b3gcY5R_MUo4fT5bPqUQ66ep
系统设计+
https://roadmap.sh/system-design
Everything you need to know about designing large scale systems.
https://www.youtube.com/watch?v=F2FmTdLtb_4
This complete system design tutorial covers scalability, reliability, data handling, and high-level architecture with clear explanations, real-world examples, and practical strategies.
数据治理+
https://www.ibm.com/think/topics/data-governance
Data governance is the data management discipline that focuses on the quality, security and availability of an organization’s data.
https://www.youtube.com/watch?v=uPsUjKLHLAg
Building data fabric eliminates the technological complexities of data governance so users can connect to the right data at the right time, regardless of where it resides.
算法+
https://roadmap.sh/datastructures-and-algorithms
Step by step guide to learn Data Structures and Algorithms in 2025
https://www.hellointerview.com/learn/code
A visual guide to the most important patterns and approaches for the coding interview.
https://www.w3schools.com/dsa/
推荐系统+
[英文] Recommender Systems
https://www.d2l.ai/chapter_recommender-systems/index.html
Recommender systems are widely employed in industry and are ubiquitous in our daily lives.
广告系统+
https://github.com/InteractiveAdvertisingBureau/openrtb2.x
Real-time Bidding (RTB) is a way transacting media that allows an individual ad impression to be put up for bid in real-time.
https://people.eecs.berkeley.edu/~jfc/DataMining/SP12/lecs/lec12.pdf
https://wnzhang.net/teaching/ee448/slides/11-computational-ads.pdf
If a bidder bids higher than his true value, then...
相关职位

社招3年以上技术类
1,开发套件研发:负责大数据开发套件的研发和优化工作,包括数据开发、数据调度、数据集成(如FlinkCDC)、数据血缘、数据质量等模块的设计与研发; 2,持续跟踪和引进新技术,推动团队进行技术创新和研究,提升大数据处理和分析的效率、稳定性和可扩展性; 3,理解用户需求和业务场景,提供技术解决方案,推动大数据开发套件在用户侧的落地实施。
更新于 2023-12-26

社招2年以上技术类
1、【数据通道】负责大数据通道Tunnel服务的建设 2、【存储引擎】HDFS、Alluxio、JuiceFS等大数据存储系统的内核研发,跟进社区版本、改进性能、提升稳定性、定制新功能; 3、【业务支撑】负责排查、定位、解决生产集群问题,与运维同学一起维护生产集群的稳定性,协助业务方一起使用好大数据平台; 4、【平台规划】参与规划公司存储平台的技术演技,打造高稳定性、高性能、低成本的存储平台。
更新于 2024-07-03

社招2年以上技术类
1、【引擎研发】负责Spark、Presto、Hive 为基础的大数据查询引擎内核研发,跟进社区版本,改进性能,提升稳定性,研发新功能,修复内核BUG; 2、【业务支撑】负责排查、定位、解决生产集群问题,与运维同学一起维护生产集群的稳定性,协助业务方一起使用好大数据平台; 3、【平台规划】参与规划公司计算平台的技术演技,提升计算平台湖仓能力,基于云IAAS或者自建IAAS,打造高稳定性、高性能、低成本的计算平台。
更新于 2023-12-26