阿里巴巴阿里国际站-数据服务开发工程师-杭州/北京
社招全职2年以上技术类-开发地点:北京 | 杭州状态:招聘
任职要求
职位描述 1、计算机或相关专业,两年及以上相关岗位工作经验。 2、熟悉大模型基础概念和智能体编排,有NL2SQL、大模型分析智能体经验者优先。 3、精通java或者C++,具备分布式架构系统研发、分析引擎经验者优先, 有分布式OLAP开发应用经验者优先 4、精通一种或多种分布式计算、存储、调度框架或工具(Hadoop/Hive/MapReduce/Spark/Hbase/Flink/ES等),有丰富的大数据开发经验。 5、业务理解力强,对数据、新技术敏感;能基于对复杂业务逻辑的抽象,快速解决问题。 6、积极乐观、诚信、有责任心,具备强烈的进取心、求知欲及团队合作精神。
工作职责
1、负责阿里巴巴广告数据核心资产建设和管理,构建适用于复杂广告业务场景的核心商业数据产品。 2、基于阿里海量数据的数据仓库建设及数据分析,通过数据+AI+算法+工程化能力,为ICBU广告业务场景探索AI+大数据解决方案。
包括英文材料
大模型+
https://www.youtube.com/watch?v=xZDB1naRUlk
You will build projects with LLMs that will enable you to create dynamic interfaces, interact with vast amounts of text data, and even empower LLMs with the capability to browse the internet for research papers.
https://www.youtube.com/watch?v=zjkBMFhNj_g
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
C+++
https://www.learncpp.com/
LearnCpp.com is a free website devoted to teaching you how to program in modern C++.
https://www.youtube.com/watch?v=ZzaPdXTrSb8
OLAP+
https://www.youtube.com/watch?v=iw-5kFzIdgY
OLAP (for online analytical processing) is software for performing multidimensional analysis at high speeds on large volumes of data from a data warehouse, data mart, or some other unified, centralized data store.
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
MapReduce+
https://www.youtube.com/watch?v=bcjSe0xCHbE
https://www.youtube.com/watch?v=cHGaQz0E7AU
In this video I explain the basics of Map Reduce model, an important concept for any software engineer to be aware of.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
HBase+
[英文] HBase Tutorial
https://www.tutorialspoint.com/hbase/index.htm
HBase is a data model that is similar to Google's big table designed to provide quick random access to huge amounts of structured data. This tutorial provides an introduction to HBase, the procedures to set up HBase on Hadoop File Systems, and ways to interact with HBase shell.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
ElasticSearch+
https://www.youtube.com/watch?v=a4HBKEda_F8
Learn about Elasticsearch with this comprehensive course designed for beginners, featuring both theoretical concepts and hands-on applications using Python (though applicable to any programming language). The course is structured in two parts: first covering essential Elasticsearch fundamentals including index management, document storage, text analysis, pipeline creation, search functionality, and advanced features like semantic search and embeddings; followed by a practical section where you'll build a real-world website using Elasticsearch as a search engine, working with the Astronomy Picture of the Day (APOD) dataset to implement features such as data cleaning pipelines, tokenization, pagination, and aggregations.
大数据+
https://www.youtube.com/watch?v=bAyrObl7TYE
https://www.youtube.com/watch?v=H4bf_uuMC-g
With all this talk of Big Data, we got Rebecca Tickle to explain just what makes data into Big Data.
相关职位
社招3年以上技术类-开发
1. 负责图灵产品模块研发,理解数据资产、数据服务的业务和技术,独立设计技术方案并高质量的研发落地; 2. 持续通过技术手段、智能化手段提升系统的自动化运维和治理能力,做好PB级标签画像、百万QPS在线服务引擎和智能化人群定向的运维工作; 3. 持续研究数据资产、数据服务的业务知识和技术,迭代数据资产和服务平台能力;
更新于 2025-09-08
社招2年以上
1. 负责淘宝直播用户增长和运营相关场景的架构设计和需求开发工作,支撑直播用户规模持续增长。 2. 负责直播电商场运营(双十一、618)活动技术架构,服务稳定性方案设计工作,保证服务在高并发电商场景下持续稳定运行。 3. 持续改进系统架构,保证系统高性能、高可用性和高可扩展性。 4. 新技术预研,完成项目的选型和设计,难点攻关。
更新于 2025-08-06
社招技术
1.参与国际化出行核心业务系统的相关研发工作,包含交易,价格,收银、财税等多个方向 2.参与服务端业务架构设计、模块划分和开发 3.完成系统优化和重构,提供系统可用性和稳定性 4.积极跟其他团队沟通和配合,推动项目进展,讨论并提出有建设性的意见
更新于 2025-08-01