阿里云阿里云智能-元数据研发专家(Data+AI)--杭州/深圳
社招全职5年以上云智能集团地点:深圳 | 杭州状态:招聘
任职要求
1. 本科及以上学历,计算机科学、软件工程或相关专业; 2. 5年以上分布式系统研发经验,具有扎实的C++、Java或Python编程能力 3. 熟练掌握Linux环境下的系统编程,具备较强的问题定位能力和丰富的性能调优经验,熟悉大型分布式下的编程,了解k8s、docker等容器化技术。 4.熟悉Oracle/SQLServer/MySQL/PG等关系型数据库,或熟悉MongoDB/Redis/HBase/Cassendra等开源数据库、队列产品,了解其原理或有运维经验者优先; 5. 熟悉数据湖技术,如hudi、iceberg、deltalake等 6. 熟悉相关元数据系统技术,如Glue Catalog、gravitino、Databricks UnityCatalog、GCP DataCatalog等 7. 熟悉大数据计算引擎,有Presto/Hudi/Hive/Spark/Flink/PowerBI等开发经验; 8. 具备良好的沟通能力和团队合作精神,能够与其他团队紧密合作 9. 具有快速学习新技术和适应新环境的能力 加分项: 1、主导过大型复杂项目的经验和方法 2、开源的代码贡献者或数据库内核开发者 3、对阿里云、AWS、微软云有认识。 5、在RDBMS、NoSQL、大数据三个领域中熟悉某一种产品的使用、原理、源码
工作职责
1. 负责设计和开发DMS的统一元数据系统;包括所支持的40+种数据源的深度化研究,并将相关技术转化为产品 2. 开发和维护DMS异构数据源查询、跨数据源联合分析、湖数据分析相关功能的能力; 3. 设计和实现大规模分布式系统,深度参与计算引擎与存储引擎的联合优化; 4. 与其他团队紧密合作,包括产品、测试和运维团队,确保软件开发流程的顺利进行; 5. 参与代码审查和团队技术分享活动,提高团队技术水平。
包括英文材料
学历+
分布式系统+
https://www.distributedsystemscourse.com/
The home page of a free online class in distributed systems.
https://www.youtube.com/watch?v=7VbL89mKK3M&list=PLOE1GTZ5ouRPbpTnrZ3Wqjamfwn_Q5Y9A
C+++
https://www.learncpp.com/
LearnCpp.com is a free website devoted to teaching you how to program in modern C++.
https://www.youtube.com/watch?v=ZzaPdXTrSb8
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
Linux+
https://ryanstutorials.net/linuxtutorial/
Ok, so you want to learn how to use the Bash command line interface (terminal) on Unix/Linux.
https://ubuntu.com/tutorials/command-line-for-beginners
The Linux command line is a text interface to your computer.
https://www.youtube.com/watch?v=6WatcfENsOU
In this Linux crash course, you will learn the fundamental skills and tools you need to become a proficient Linux system administrator.
https://www.youtube.com/watch?v=v392lEyM29A
Never fear the command line again, make it fear you.
https://www.youtube.com/watch?v=ZtqBQ68cfJc
性能调优+
https://goperf.dev/
The Go App Optimization Guide is a series of in-depth, technical articles for developers who want to get more performance out of their Go code without relying on guesswork or cargo cult patterns.
https://web.dev/learn/performance
This course is designed for those new to web performance, a vital aspect of the user experience.
https://www.ibm.com/think/insights/application-performance-optimization
Application performance is not just a simple concern for most organizations; it’s a critical factor in their business’s success.
https://www.oreilly.com/library/view/optimizing-java/9781492039259/
Performance tuning is an experimental science, but that doesn’t mean engineers should resort to guesswork and folklore to get the job done.
Kubernetes+
https://kubernetes.io/docs/tutorials/kubernetes-basics/
This tutorial provides a walkthrough of the basics of the Kubernetes cluster orchestration system.
https://kubernetes.io/zh-cn/docs/tutorials/kubernetes-basics/
本教程介绍 Kubernetes 集群编排系统的基础知识。每个模块包含关于 Kubernetes 主要特性和概念的一些背景信息,还包括一个在线教程供你学习。
https://www.youtube.com/watch?v=s_o8dwzRlu4
Hands-On Kubernetes Tutorial | Learn Kubernetes in 1 Hour - Kubernetes Course for Beginners
https://www.youtube.com/watch?v=X48VuDVv0do
Full Kubernetes Tutorial | Kubernetes Course | Hands-on course with a lot of demos
Docker+
https://www.youtube.com/watch?v=GFgJkfScVNU
Master Docker in one course; learn about images and containers on Docker Hub, running multiple containers with Docker Compose, automating workflows with Docker Compose Watch, and much more. 🐳
https://www.youtube.com/watch?v=kTp5xUtcalw
Learn how to use Docker and Kubernetes in this complete hand-on course for beginners.
Oracle+
[英文] Oracle Tutorial
https://www.oracletutorial.com/
On this website, you can learn Oracle Database fast and easily.
https://www.youtube.com/watch?v=QHYuuXPdQNM&list=PL_c9BZzLwBRJ8f9-pSPbxSSG6lNgxQ4m9
SQL Server+
[英文] SQL Server Tutorial
https://www.sqlservertutorial.net/
If you are looking for an easy, fast, and efficient way to master SQL Server, you are in the right place.
https://www.youtube.com/watch?v=voTZUMw23pg
MySQL+
https://juejin.cn/post/7190306988939542585
这是一篇 MySQL 通关一篇过硬核经验学习路线,包括数据库相关知识,SQL语句的使用,数据库约束,设计等。
[英文] MySQL Tutorial
https://www.mysqltutorial.org/
your go-to resource for mastering MySQL in a fast, easy, and enjoyable way.
https://www.youtube.com/watch?v=5OdVJbNCSso
MySQL SQL tutorial for beginners
https://www.youtube.com/watch?v=7S_tz1z_5bA
This beginner-friendly course teaches you SQL from scratch.
PostgreSQL+
[英文] PostgreSQL Tutorial
https://neon.com/postgresql/tutorial
This PostgreSQL tutorial helps you quickly understand PostgreSQL.
[英文] PostgreSQL Tutorial
https://www.pgtutorial.com/
This PostgreSQL tutorial will teach you about PostgreSQL from beginner to advanced.
https://www.youtube.com/watch?v=qw--VYLpxG4
It is the most advanced open source database system widely used to build back-end systems.
https://www.youtube.com/watch?v=SpfIwlAYaKk
Learn PostgreSQL, one of the world's most advanced and robust open-source relational database systems.
MongoDB+
https://learnxinyminutes.com/mongodb/
MongoDB is a NoSQL document database for high volume data storage.
https://studio3t.com/academy/#courses
The fastest way to learn MongoDB
https://www.youtube.com/watch?v=c2M-rlkkT5o
This video will give you and introduction to MongoDB in 1 Hour. Afterwards I recommend exploring aggregation, replication, and sharding.
https://www.youtube.com/watch?v=ExcRbA7fy_A&list=PL4cUxeGkcC9h77dJ-QJlwGlZlTd4ecZOA
You'll learn how to use MongoDB (a NoSQL database) from scratch. You'll also learn how to integrate it into a simple Node.js API.
Redis+
[英文] Developer Hub
https://redis.io/dev/
Get all the tutorials, learning paths, and more you need to start building—fast.
https://www.runoob.com/redis/redis-tutorial.html
REmote DIctionary Server(Redis) 是一个由 Salvatore Sanfilippo 写的 key-value 存储系统,是跨平台的非关系型数据库。
https://www.youtube.com/watch?v=jgpVdJB2sKQ
In this video I will be covering Redis in depth from how to install it, what commands you can use, all the way to how to use it in a real world project.
HBase+
[英文] HBase Tutorial
https://www.tutorialspoint.com/hbase/index.htm
HBase is a data model that is similar to Google's big table designed to provide quick random access to huge amounts of structured data. This tutorial provides an introduction to HBase, the procedures to set up HBase on Hadoop File Systems, and ways to interact with HBase shell.
Hudi+
[英文] Spark Quick Start
https://hudi.apache.org/docs/quick-start-guide
we will walk through code snippets that allows you to insert, update, delete and query a Hudi table.
https://www.oreilly.com/library/view/apache-hudi-the/9781098173821/
Overcome challenges in building transactional guarantees on rapidly changing data by using Apache Hudi.
https://www.youtube.com/watch?v=pyK18sDYnS0
In this video, I'll introduce you to one of the most popular Data Lake solutions out there, Apache Hudi!
Iceberg+
https://iceberg.apache.org/spark-quickstart/
This guide will get you up and running with Apache Iceberg™ using Apache Spark™, including sample code to highlight some powerful features.
https://www.baeldung.com/apache-iceberg-intro
This tutorial will discuss Apache Iceberg, a popular open table format in today’s big data landscape.
https://www.youtube.com/watch?v=TsmhRZElPvM
You’ve probably heard about Apache Iceberg™—after all, it’s been getting a lot of buzz.
Delta Lake+
https://delta.io/learn/getting-started/
This guide helps you quickly explore the main features of Delta Lake.
[英文] Delta Lake Tutorials
https://delta.io/learn/tutorials/
Try out the latest tutorials for the open-source Delta Lake project.
[英文] Tutorial: Delta Lake
https://docs.databricks.com/aws/en/delta/tutorial
This tutorial introduces common Delta Lake operations on Databricks.
https://www.youtube.com/watch?v=fkWxiesfrgk
In this Delta Lake course, we will go though all the important concepts of Delta Lake.
大数据+
https://www.youtube.com/watch?v=bAyrObl7TYE
https://www.youtube.com/watch?v=H4bf_uuMC-g
With all this talk of Big Data, we got Rebecca Tickle to explain just what makes data into Big Data.
Presto+
[英文] What is Presto?
https://prestodb.io/what-is-presto/
https://www.tutorialspoint.com/apache_presto/index.htm
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
Power BI+
[英文] Power BI Tutorial
https://www.tutorialspoint.com/power_bi/index.htm
Power BI is a Data Visualization and Business Intelligence tool that converts data from different data sources to interactive dashboards and BI reports.
https://www.youtube.com/watch?v=FwjaHCVNBWA
内核+
https://www.youtube.com/watch?v=C43VxGZ_ugU
I rummage around the Linux kernel source and try to understand what makes computers do what they do.
https://www.youtube.com/watch?v=HNIg3TXfdX8&list=PLrGN1Qi7t67V-9uXzj4VSQCffntfvn42v
Learn how to develop your very own kernel from scratch in this programming series!
https://www.youtube.com/watch?v=JDfo2Lc7iLU
Denshi goes over a simple explanation of what computer kernels are and how they work, alonside what makes the Linux kernel any special.
AWS+
https://aws.amazon.com/
Amazon Web Services offers reliable, scalable, and inexpensive cloud computing services. Free to join, pay only for what you use.
NoSQL+
https://nosql-database.org/
Everything about NoSQL Systems – Types, Benefits, and Real-World Uses
https://piaosanlang.gitbooks.io/mongodb/content/section1.1.html
NoSQL(NoSQL = Not Only SQL ),即"不仅仅是SQL",指的是非关系型的数据库。是对不同于传统的关系型数据库管理系统的统称。
https://www.youtube.com/watch?v=0buKQHokLK8
NoSQL databases can operate in multiple modes: as key-value store, document store or wide column store.
相关职位
社招5年以上云智能集团
1. 负责设计和开发 MaxCompute 统一托管,多数据源的平台级能力,构建湖与仓、仓与库,多引擎、多存储的统一元数据服务; 2. 开发和维护 MaxCompute 支持异构数据源直接查询、跨数据源联合分析的能力; 3. 设计和实现大规模分布式系统,深度参与计算引擎与存储引擎的联合优化; 4. 与其他团队紧密合作,包括产品、测试和运维团队,确保软件开发流程的顺利进行; 5. 参与代码审查和团队技术分享活动,提高团队技术水平。
更新于 2025-08-29
社招6年以上
1、负责淘系各类决策数据体系(用户、营销、供应链、搜推、价格力等)的建设,通过数据+工程化,联合BI赋能管理决策,提供高质、稳定的1+N+N决策数据产品; 2、建设淘系核心的数据资产(用户画像、商品资产等),利用数据、分析、算法、产品化等数据能力,联合数据科学,赋能集团新零售场景数据化运营转型; 3、构建淘系模型、稳定性、质量、成本等治理体系,建设丰富的技术+业务元数据,通过工程化能力,打造先进的淘宝数据治理平台,服务前台业务; 4、引入AIGC大模型能力,通过数据+算法+工程化,打磨智能化的数据取数工具,实现数据普惠。
更新于 2025-07-11
社招5年以上技术类-数据
1、参与到本地生活PB级数据仓库的建设,通过构建本地生活零售数据中台,服务于本地生活商家和用户,提供丰富稳定的数据化产品服务 2、参与到本地生活用户、商品、商户等核心资产建设,构建丰富的人群标签库、商品库、商机库等,助力业务产品不断优化 3、能基于准确性、及时性、稳定性的要求不断提高数据中台的质量和服务
更新于 2025-09-24