阿里云阿里云智能-MaxCompute研发专家-平台架构研发方向
社招全职5年以上云智能集团地点:北京 | 杭州状态:招聘
任职要求
1.具备从1到10构建高并发、高可用、强隔离平台系统的能力,能主导多租户架构下的核心子系统设计与落地,支撑百万级用户、千万级作业并发场景。 2.精通C++语言,熟练掌握IO模型、多线程编程、内存管理、异常安全、资源生命周期控制等核心机制;具备在高并发、低延迟场景下编写高性能、高鲁棒性代码的能力。 3.具有 Spark、Flink 等主流大数据计算框架的开发或调优经验,系统理解其存储与计算原理。 4.熟练使用 perf、ebpf、DWARF、libunwind 等工具链,具备从应用层到内核层的全链路问题定位能力,能快速诊断复杂线上性能瓶颈。 5.代码风格规范,结构清晰,对线程安全、内存泄漏、资源竞争等问题有深刻认知;能写出“经得起评审、扛得住压测、耐得住时间”的生产级系统代码。 6.对系统性能与稳定性有极致追求,善于在复杂场景中抽象问题并推动落地;具备优秀的沟通与协作能力,积极参与设计和代码评审与技术分享,推动团队整体技术提升。
工作职责
1) 深度参与下一代多租户云数仓底座的构建,主导高可用、强隔离、可扩展的平台级系统设计。 2) 主导Open API体系、多租应用隔离、统一认证与访问控制等核心子系统研发,定义企业级数据平台的安全边界与服务边界; 3) 参与新一代数据通道协议的演进与落地,全面挑战高吞吐、低延迟、强一致下的数据流转极限; 4) 构建全链路可观测与稳态保障体系,推动平台实现“开箱即稳”,影响阿里千万核集群的稳定运行,定义下一代云原生数据仓库的工程范式。
包括英文材料
高并发+
https://www.baeldung.com/concurrency-principles-patterns
In this tutorial, we’ll discuss some of the design principles and patterns that have been established over time to build highly concurrent applications.
https://www.baeldung.com/java-concurrency
Handling concurrency in an application can be a tricky process with many potential pitfalls. A solid grasp of the fundamentals will go a long way to help minimize these issues.
https://www.oreilly.com/library/view/concurrency-in-go/9781491941294/
You’ll understand how Go chooses to model concurrency, what issues arise from this model, and how you can compose primitives within this model to solve problems.
https://www.oreilly.com/library/view/modern-concurrency-in/9781098165406/
With this book, you'll explore the transformative world of Java 21's key feature: virtual threads.
https://www.youtube.com/watch?v=qyM8Pi1KiiM
https://www.youtube.com/watch?v=wEsPL50Uiyo
高可用+
https://redis.io/blog/high-availability-architecture/
A high available architecture is when there are a number of different components, modules, or services that work together to maintain optimal performance, irrespective of peak-time loads.
https://www.ibm.com/think/topics/high-availability
High availability (HA) is a term that refers to a system’s ability to be accessible and reliable close to 100% of the time.
C+++
https://www.learncpp.com/
LearnCpp.com is a free website devoted to teaching you how to program in modern C++.
https://www.youtube.com/watch?v=ZzaPdXTrSb8
多线程+
https://liaoxuefeng.com/books/java/threading/basic/index.html
和单线程相比,多线程编程的特点在于:多线程经常需要读写共享数据,并且需要同步。
https://www.youtube.com/watch?v=_uQgGS_VIXM&list=PLsc-VaxfZl4do3Etp_xQ0aQBoC-x5BIgJ
https://www.youtube.com/watch?v=IEEhzQoKtQU
https://www.youtube.com/watch?v=mTGdtC9f4EU&list=PLL8woMHwr36EDxjUoCzboZjedsnhLP1j4
https://www.youtube.com/watch?v=TPVH_coGAQs&list=PLk6CEY9XxSIAeK-EAh3hB4fgNvYkYmghp
https://www.youtube.com/watch?v=xPqnoB2hjjA
This video is an introduction to multithreading in modern C++.
https://www.youtube.com/watch?v=YKBwKy5PrpQ
Rust threading is easy to implement and improves the efficiency of your applications on multi-core systems!
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
大数据+
https://www.youtube.com/watch?v=bAyrObl7TYE
https://www.youtube.com/watch?v=H4bf_uuMC-g
With all this talk of Big Data, we got Rebecca Tickle to explain just what makes data into Big Data.
Perf+
https://perfwiki.github.io/main/
perf is powerful: it can instrument CPU performance counters, tracepoints, kprobes, and uprobes (dynamic tracing).
https://www.brendangregg.com/bpf-performance-tools-book.html
This book can help you get the most out of your systems and applications, helping you improve performance, reduce costs, and solve software issues.
[英文] perf Examples
https://www.brendangregg.com/perf.html
These are some examples of using the perf Linux profiler, which has also been called Performance Counters for Linux (PCL), Linux perf events (LPE), or perf_events.
https://www.youtube.com/watch?v=M6ldFtwWup0
eBPF+
https://ebpf.io/get-started/
eBPF is a revolutionary technology that can run sandboxed programs in the Linux kernel without changing kernel source code or loading a kernel module.
内核+
https://www.youtube.com/watch?v=C43VxGZ_ugU
I rummage around the Linux kernel source and try to understand what makes computers do what they do.
https://www.youtube.com/watch?v=HNIg3TXfdX8&list=PLrGN1Qi7t67V-9uXzj4VSQCffntfvn42v
Learn how to develop your very own kernel from scratch in this programming series!
https://www.youtube.com/watch?v=JDfo2Lc7iLU
Denshi goes over a simple explanation of what computer kernels are and how they work, alonside what makes the Linux kernel any special.
相关职位
社招8年以上技术类-开发
1、承担MaxCompute管控系统架构师角色,负责产品技术架构演进方向 2、面向全球不同客户设计合理的产品方案,梳理存储、计算、售卖、控制台、运维体系架构,确定技术方案选型 3、承担MaxCompute管控系统设计、研发、测试、发布与运维 4、与MaxCompute各研发团队+SRE中台团队+阿里云售卖平台配合,共同推进技术项目按要求落地
更新于 2025-04-02
社招5年以上技术类-开发
1、参与针对企业数据安全治理领域的数据安全中心相关研发工作,实现大数据/AI一体化的数据安全治理,实现统一的数据安全性以及风险与合规性管理; 2、参与解决企业数据安全治理领域面临的痛点,设计并实现相关安全解决方案,解决客户在数据集成、数据开发分析等大数据治理过程中的安全问题; 3、参与数据安全中心的系统架构设计与演进迭代,持续优化系统安全性、稳定性、可扩展性、性能,以及使用体验,满足大数据/AI持续发展的业务形态与规模对数据安全的需求; 4、参与企业数据安全治理领域的技术动向研究,实现与业务贴合的安全能力技术攻坚,进行技术上的前瞻探索,实现面向未来的数据安全中心的规划、设计和落地,保持在企业数据安全治理领域的技术先进性。
更新于 2025-06-16
社招5年以上产品类-商业型
1. 负责阿里云大数据开发治理平台DataWorks的产品规划、设计以及落地,重点在数据ETL产品工具链及“Data+AI”方向; 2. 调研各行业用户需求和国内外同类产品,提炼和规划产品迭代计划; 3. 负责产品商业模式设计与GTM,支持业务团队完成业务目标; 4. 负责产品布道,包括内外部培训,市场活动,数据分析等。关注用户反馈,分析用户行为,进行产品用户体验的持续优化。
更新于 2025-06-16