腾讯大数据计算引擎研发工程师(北京/深圳/上海)
社招全职公共技术地点:上海状态:招聘
任职要求
1.计算机相关专业硕士及以上学历为佳,具有扎实的数据库、分布式计算领域专业知识,关注学术和工业界的进展,关注新硬件,新架构,对某一细分方向有深入研究者更优; 2.熟悉主流大数据组件如spark、presto、flink、starrocks等,熟悉数据库/大数据标准模块、框架如substrait、calcite、arrow、velox、gluten等; 3.熟悉k8s、对云原生、混合多云有一定了解; 4.具有优秀的学习能力、创新能力、沟通能力、团队合作意识;强烈的责任心与主动性,对所负责工作有owner意识,并能自我驱动成长; 5.有开源项目贡献,在数仓、数据库领域的顶会如VLDB,SIGMOD等有贡献者优先。
工作职责
1.负责大数据相关计算引擎核心研发,为腾讯的智能湖仓打造领先业界的计算内核, 推进大数据业务的高效发展; 2.负责腾讯计算内核领域前沿技术调研,与开源社区保持交流,根据业务特性和需求,引入前沿技术。
包括英文材料
学历+
大数据+
https://www.youtube.com/watch?v=bAyrObl7TYE
https://www.youtube.com/watch?v=H4bf_uuMC-g
With all this talk of Big Data, we got Rebecca Tickle to explain just what makes data into Big Data.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Presto+
[英文] What is Presto?
https://prestodb.io/what-is-presto/
https://www.tutorialspoint.com/apache_presto/index.htm
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
StarRocks+
https://docs.starrocks.io/docs/quick_start/
These Quick Start guides will help you get going with a small StarRocks environment.
https://itnext.io/introduction-to-starrocks-a-new-modern-analytical-database-1db2177d26e1
Recently, I had the opportunity to explore StarRocks which is the new kid in the block when talking about massive scale databases which are able to handle petabytes of data.
Calcite+
https://calcite.apache.org/docs/tutorial.html
This is a step-by-step tutorial that shows how to build and connect to Calcite.
https://www.baeldung.com/apache-calcite
It’s a powerful data management framework that can be used in various scenarios concerning data access.
Kubernetes+
https://kubernetes.io/docs/tutorials/kubernetes-basics/
This tutorial provides a walkthrough of the basics of the Kubernetes cluster orchestration system.
https://kubernetes.io/zh-cn/docs/tutorials/kubernetes-basics/
本教程介绍 Kubernetes 集群编排系统的基础知识。每个模块包含关于 Kubernetes 主要特性和概念的一些背景信息,还包括一个在线教程供你学习。
https://www.youtube.com/watch?v=s_o8dwzRlu4
Hands-On Kubernetes Tutorial | Learn Kubernetes in 1 Hour - Kubernetes Course for Beginners
https://www.youtube.com/watch?v=X48VuDVv0do
Full Kubernetes Tutorial | Kubernetes Course | Hands-on course with a lot of demos
相关职位
社招TEG技术
1.负责湖仓存储系统内核的极致优化,设计并实现异步湖仓智能优化模块,提升数据写入/查询性能与资源利用率; 2.生态对接与计算融合:深度整合 Spark、Flink、SR 等计算引擎,实现湖仓与流批一体场景的平滑对接,支持实时数仓与离线分析协同; 3.开源协同与技术影响力:参与 Iceberg 等开源项目贡献,主导定制化功能开发,推动技术文档完善与社区生态共建。
更新于 2025-05-26
社招3年以上TEG技术
1.基于开源项目 Ray,打造业内领先的通用分布式计算引擎,包括但不限于以下方向:引擎内核(分布式Task调度与执行)、分布式数据处理框架、分布式在线服务编排框架等; 2.面向 Data + AI,支持和拓展以 Ray 为 infra 的多种业务场景,包括但不限于以下方向:数据科学、大模型训练数据管道服务、在线推理与离线推理、AI Agent与应用系统、隐私计算、图计算等; 3.与 K8S 深度融合,建设云原生环境下超大规模分布式系统的服务能力与平台化能力,为业务提供高可用、可扩展、高易用性的集群化服务; 4.参与开源共建与合作,提升团队与个人在业界的影响力。
更新于 2025-06-09