阿里云阿里云智能-大数据研发专家-元数据
社招全职5年以上云智能集团地点:杭州状态:招聘
任职要求
1. 本科及以上学历,计算机科学、软件工程或相关专业; 2. 3年以上分布式系统研发经验,具有扎实的C++、Java或Python编程能力; 3. 熟练掌握Linux环境下的编程; 4. 熟悉数据湖技术,如hudi、iceberg、deltalake等; 5. 熟悉大数据计算引擎,如Spark、Flink、Presto等; 6. 具备良好的沟通能力和团队合作精神,能够与其他团队紧密合作; 7. 具有快速学习新技术和适应新环境的能力。
工作职责
1. 负责设计和开发 MaxCompute 统一托管,多数据源的平台级能力,构建湖与仓、仓与库,多引擎、多存储的统一元数据服务; 2. 开发和维护 MaxCompute 支持异构数据源直接查询、跨数据源联合分析的能力; 3. 设计和实现大规模分布式系统,深度参与计算引擎与存储引擎的联合优化; 4. 与其他团队紧密合作,包括产品、测试和运维团队,确保软件开发流程的顺利进行; 5. 参与代码审查和团队技术分享活动,提高团队技术水平。
包括英文材料
学历+
分布式系统+
https://www.distributedsystemscourse.com/
The home page of a free online class in distributed systems.
https://www.youtube.com/watch?v=7VbL89mKK3M&list=PLOE1GTZ5ouRPbpTnrZ3Wqjamfwn_Q5Y9A
C+++
https://www.learncpp.com/
LearnCpp.com is a free website devoted to teaching you how to program in modern C++.
https://www.youtube.com/watch?v=ZzaPdXTrSb8
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
Linux+
https://ryanstutorials.net/linuxtutorial/
Ok, so you want to learn how to use the Bash command line interface (terminal) on Unix/Linux.
https://ubuntu.com/tutorials/command-line-for-beginners
The Linux command line is a text interface to your computer.
https://www.youtube.com/watch?v=6WatcfENsOU
In this Linux crash course, you will learn the fundamental skills and tools you need to become a proficient Linux system administrator.
https://www.youtube.com/watch?v=v392lEyM29A
Never fear the command line again, make it fear you.
https://www.youtube.com/watch?v=ZtqBQ68cfJc
大数据+
https://www.youtube.com/watch?v=bAyrObl7TYE
https://www.youtube.com/watch?v=H4bf_uuMC-g
With all this talk of Big Data, we got Rebecca Tickle to explain just what makes data into Big Data.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
Presto+
[英文] What is Presto?
https://prestodb.io/what-is-presto/
https://www.tutorialspoint.com/apache_presto/index.htm
Hudi+
[英文] Spark Quick Start
https://hudi.apache.org/docs/quick-start-guide
we will walk through code snippets that allows you to insert, update, delete and query a Hudi table.
https://www.oreilly.com/library/view/apache-hudi-the/9781098173821/
Overcome challenges in building transactional guarantees on rapidly changing data by using Apache Hudi.
https://www.youtube.com/watch?v=pyK18sDYnS0
In this video, I'll introduce you to one of the most popular Data Lake solutions out there, Apache Hudi!
Iceberg+
https://iceberg.apache.org/spark-quickstart/
This guide will get you up and running with Apache Iceberg™ using Apache Spark™, including sample code to highlight some powerful features.
https://www.baeldung.com/apache-iceberg-intro
This tutorial will discuss Apache Iceberg, a popular open table format in today’s big data landscape.
https://www.youtube.com/watch?v=TsmhRZElPvM
You’ve probably heard about Apache Iceberg™—after all, it’s been getting a lot of buzz.
Delta Lake+
https://delta.io/learn/getting-started/
This guide helps you quickly explore the main features of Delta Lake.
[英文] Delta Lake Tutorials
https://delta.io/learn/tutorials/
Try out the latest tutorials for the open-source Delta Lake project.
[英文] Tutorial: Delta Lake
https://docs.databricks.com/aws/en/delta/tutorial
This tutorial introduces common Delta Lake operations on Databricks.
https://www.youtube.com/watch?v=fkWxiesfrgk
In this Delta Lake course, we will go though all the important concepts of Delta Lake.
相关职位
社招6年以上
1、负责淘系各类决策数据体系(用户、营销、供应链、搜推、价格力等)的建设,通过数据+工程化,联合BI赋能管理决策,提供高质、稳定的1+N+N决策数据产品; 2、建设淘系核心的数据资产(用户画像、商品资产等),利用数据、分析、算法、产品化等数据能力,联合数据科学,赋能集团新零售场景数据化运营转型; 3、构建淘系模型、稳定性、质量、成本等治理体系,建设丰富的技术+业务元数据,通过工程化能力,打造先进的淘宝数据治理平台,服务前台业务; 4、引入AIGC大模型能力,通过数据+算法+工程化,打磨智能化的数据取数工具,实现数据普惠。
更新于 2025-07-11
社招5年以上云智能集团
1. 负责设计和开发DMS的统一元数据系统;包括所支持的40+种数据源的深度化研究,并将相关技术转化为产品 2. 开发和维护DMS异构数据源查询、跨数据源联合分析、湖数据分析相关功能的能力; 3. 设计和实现大规模分布式系统,深度参与计算引擎与存储引擎的联合优化; 4. 与其他团队紧密合作,包括产品、测试和运维团队,确保软件开发流程的顺利进行; 5. 参与代码审查和团队技术分享活动,提高团队技术水平。
更新于 2025-09-22
社招3年以上技术类-数据
1. 主要参与搜索推荐、用户增长、零售等业务数据开发; 2. 参与实时、离线数据链路治理,通过数据治理与质量优化,支持业务提效; 3. 基于对业务理解和产品诉求的抽象,参与到面向业务应用的流批一体数据湖仓架构设计和研发落地; 4. 深入理解电商平台的业务,通过过程性数据分析,持续定位挖掘潜在问题,助力业务发展;
更新于 2025-08-27