腾讯智能湖仓研发工程师(深圳/上海/北京)
社招全职TEG技术地点:深圳状态:招聘
任职要求
1.具备良好的 Java / Scala 编程基础和良好的计算机技术基础,同时具备良好的沟通能力和团队协作能力; 2.熟悉开源数据湖存储方案 Hudi,Iceberg,Delta Lake 的原理及源码,有内核开发经验或社区贡献者优先,开源社区 committer / PMC 优先; 3.熟悉 Parquet,ORC,Arrow 文件格式,或者 Avro,Protobuf 行存格式者优先; 4.熟悉 Spark、Flink、Presto、Hive 等主流大数据计算引擎者优先。
工作职责
1.负责湖仓存储系统内核的极致优化,设计并实现异步湖仓智能优化模块,提升数据写入/查询性能与资源利用率; 2.生态对接与计算融合:深度整合 Spark、Flink、SR 等计算引擎,实现湖仓与流批一体场景的平滑对接,支持实时数仓与离线分析协同; 3.开源协同与技术影响力:参与 Iceberg 等开源项目贡献,主导定制化功能开发,推动技术文档完善与社区生态共建。
包括英文材料
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
Scala+
内核+
https://www.youtube.com/watch?v=C43VxGZ_ugU
I rummage around the Linux kernel source and try to understand what makes computers do what they do.
https://www.youtube.com/watch?v=HNIg3TXfdX8&list=PLrGN1Qi7t67V-9uXzj4VSQCffntfvn42v
Learn how to develop your very own kernel from scratch in this programming series!
https://www.youtube.com/watch?v=JDfo2Lc7iLU
Denshi goes over a simple explanation of what computer kernels are and how they work, alonside what makes the Linux kernel any special.
Parquet+
https://www.youtube.com/watch?v=KLFadWdomyI
Learn all about Apache Parquet, a column-based file format that's popular in the Hadoop/Spark ecosystem.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
Presto+
[英文] What is Presto?
https://prestodb.io/what-is-presto/
https://www.tutorialspoint.com/apache_presto/index.htm
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
大数据+
https://www.youtube.com/watch?v=bAyrObl7TYE
https://www.youtube.com/watch?v=H4bf_uuMC-g
With all this talk of Big Data, we got Rebecca Tickle to explain just what makes data into Big Data.
相关职位
社招公共技术
1.负责大数据相关计算引擎核心研发,为腾讯的智能湖仓打造领先业界的计算内核, 推进大数据业务的高效发展; 2.负责腾讯计算内核领域前沿技术调研,与开源社区保持交流,根据业务特性和需求,引入前沿技术。
更新于 2025-07-22
社招3年以上公共技术
1.负责湖仓存储系统内核的极致优化,设计并实现异步湖仓智能优化模块,提升数据写入/查询性能与资源利用率; 2.生态对接与计算融合:深度整合 Spark、Flink、SR 等计算引擎,实现湖仓与流批一体场景的平滑对接,支持实时数仓与离线分析协同; 3.开源协同与技术影响力:参与 Iceberg, lance 等开源项目贡献,主导定制化功能开发,推动技术文档完善与社区生态共建。
更新于 2025-10-14
社招5年以上研发类
1、设计并开发高可用、可扩展的实时数据处理系统,稳定处理来自智能手机的海量用户行为数据; 2、负责实时指标计算的数据建模、架构设计与开发,包括:数据接入、数据处理、olap分析等,针对海量的数据,能以最优的方案实现,满足性能要求; 3、负责实时系统的异常数据监测和数据质量保障,提升和数据准确性、一致性和稳定性; 4、与PM、业务方合作,深入业务解决问题。
更新于 2025-08-04