阿里巴巴阿里妈妈-引擎数据工程师-北京
社招全职2年以上地点:北京状态:招聘
任职要求
1. 较强的动手能力和学习能力,熟悉 Java(但不限于)编程语言,有良好的编程习惯,具备优秀的系统架构设计能力; 2. 有扎实的大数据工程基础,精通Flink,NoSql,Kafaka,Hadoop等一种或多种大数据技术栈, 具备丰富的分布式系统设计和实现经验,熟悉数据存储和处理的最佳实践; 3. 熟悉业界主流的广告投放系统和数据处理技术,有高并发、高性能数据处理项目经验优先. 4. 有较强的数据敏感度, 有搜推广等相关大数据领域经验优先; 5. 具备强烈的责任心,良好的沟通能力和团队合作意识,主动积极,乐于面对挑战;
工作职责
职位描述: 在这里,你将接触到国内Top的数字营销平台,了解技术和商业目标的完美结合。你将看到超大规模数据如何实时/高并发/快速处理;你会了解新进的、前沿的分布式系统处理技术。 我们需要你: 1. 支持阿里妈妈超大规模广告数据流批一体化处理; 2. 支持阿里妈妈广告大数据平台的开发、设计与维护工作,打造高可靠、低成本、简单易用的一站式端到端广告数据集成、ETL处理平台; 3. 研究海量数据的存储、传输,优化系统架构,不断提升离近线系统的 时效性、灵活性、性能;
包括英文材料
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
编程规范+
[英文] Google Style Guides
https://google.github.io/styleguide/
Every major open-source project has its own style guide: a set of conventions (sometimes arbitrary) about how to write code for that project. It is much easier to understand a large codebase when all the code in it is in a consistent style.
系统设计+
https://roadmap.sh/system-design
Everything you need to know about designing large scale systems.
https://www.youtube.com/watch?v=F2FmTdLtb_4
This complete system design tutorial covers scalability, reliability, data handling, and high-level architecture with clear explanations, real-world examples, and practical strategies.
大数据+
https://www.youtube.com/watch?v=bAyrObl7TYE
https://www.youtube.com/watch?v=H4bf_uuMC-g
With all this talk of Big Data, we got Rebecca Tickle to explain just what makes data into Big Data.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
NoSQL+
https://nosql-database.org/
Everything about NoSQL Systems – Types, Benefits, and Real-World Uses
https://piaosanlang.gitbooks.io/mongodb/content/section1.1.html
NoSQL(NoSQL = Not Only SQL ),即"不仅仅是SQL",指的是非关系型的数据库。是对不同于传统的关系型数据库管理系统的统称。
https://www.youtube.com/watch?v=0buKQHokLK8
NoSQL databases can operate in multiple modes: as key-value store, document store or wide column store.
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
分布式系统+
https://www.distributedsystemscourse.com/
The home page of a free online class in distributed systems.
https://www.youtube.com/watch?v=7VbL89mKK3M&list=PLOE1GTZ5ouRPbpTnrZ3Wqjamfwn_Q5Y9A
高并发+
https://www.baeldung.com/concurrency-principles-patterns
In this tutorial, we’ll discuss some of the design principles and patterns that have been established over time to build highly concurrent applications.
https://www.baeldung.com/java-concurrency
Handling concurrency in an application can be a tricky process with many potential pitfalls. A solid grasp of the fundamentals will go a long way to help minimize these issues.
https://www.oreilly.com/library/view/concurrency-in-go/9781491941294/
You’ll understand how Go chooses to model concurrency, what issues arise from this model, and how you can compose primitives within this model to solve problems.
https://www.oreilly.com/library/view/modern-concurrency-in/9781098165406/
With this book, you'll explore the transformative world of Java 21's key feature: virtual threads.
https://www.youtube.com/watch?v=qyM8Pi1KiiM
https://www.youtube.com/watch?v=wEsPL50Uiyo
相关职位
社招MEG
-负责搜索在线架构,包括文本搜索、视频搜索、图片搜索、语音检索、视觉检索、资讯/热议等各类搜索系统的架构研发工作 -负责服务治理与重构、云原生架构改造、搜索性能优化,保证搜索系统的可扩展性与可持续发展 -负责高并发架构机制、稳定性工程、检索延时优化、数据流系统研发,保证搜索系统全面可用性 -负责机器学习应用落地与智能化语义检索,提升搜索的智能化水平 -负责基础检索、排序架构、展现架构机制革新,支持用户体验、内容生态的革新
更新于 2025-06-10
社招MEG
-负责搜索在线架构,包括文本搜索、视频搜索、图片搜索、语音检索、视觉检索、资讯/热议等各类搜索系统的架构研发工作 -负责服务治理与重构、云原生架构改造、搜索性能优化,保证搜索系统的可扩展性与可持续发展 -负责高并发架构机制、稳定性工程、检索延时优化、数据流系统研发,保证搜索系统全面可用性 -负责AI大模型 和 机器学习应用落地与智能化语义检索,提升搜索的智能化水平 -负责基础检索、排序架构、展现架构机制革新,支持用户体验、内容生态的革新
更新于 2025-09-23
社招1年以上搜一搜技术
1.负责面向图文、视频、账号等多种内容载体的大规模数据接入、特征计算、数据存储和发布平台; 2.通过数据工程技术规范化建设:推动提升数据质量、提升pipeline稳定性、提升平台易用性,提升系统在大规模分布式环境下高并发的处理性能,同时沉淀通用方案和平台工具,提升数据研发效率; 3.支持搜索场景下各类数据特征的处理需求,跟进和引入业界最新技术,打造业界领先的离线数据流架构。
更新于 2025-08-20