影石数据湖开发工程师
社招全职3年以上地点:深圳状态:招聘
任职要求
1、本科及以上学历,计算机/软件/数据相关专业,3年以上大数据/数据平台研发经验。 2、熟练掌握 Java/Scala/Python 中至少一种开发语言。 3、熟悉 Spark、Flink 等分布式计算框架及其运行原理。 4、熟悉 Iceberg、Hudi等数据湖技术中的至少一种,有生产实践经验。 5、熟悉 Kafka 等消息中间件及实时数据处理架构。 6、熟悉分布式存储、对象存储及数据湖存储架构。 7、熟悉 Docker、Kubernetes 等云原生技术优先。 8、有完整的数据治理实践经验,熟悉 Atlas / DataHub / Amundsen 等元数据与血缘工具,了解 DAMA、DCMM 等方法论。 9…
登录查看完整任职要求
微信扫码,1秒登录
工作职责
1、负责企业级数据湖平台的设计、开发与运维,支撑海量数据存储与分析。 2、基于 Iceberg/Lance 构建湖仓一体架构,提升数据管理能力。 3、设计并实现数据全生命周期的加密方案(静态加密、传输加密、字段级加密),落地脱敏、分级分类、权限管控等安全机制,确保合规(GDPR / 等保 / 个保法)。 4、参与数据治理体系建设,包括不限于数据质量、数据血缘、数据生命周期等能力建设。 5、优化数据存储、计算性能及资源成本,保障平台稳定运行。 6、推动数据湖与 AI、数据分析等业务场景结合落地。
包括英文材料
学历+
大数据+
https://www.youtube.com/watch?v=bAyrObl7TYE
https://www.youtube.com/watch?v=H4bf_uuMC-g
With all this talk of Big Data, we got Rebecca Tickle to explain just what makes data into Big Data.
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
Scala+
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
Iceberg+
https://iceberg.apache.org/spark-quickstart/
This guide will get you up and running with Apache Iceberg™ using Apache Spark™, including sample code to highlight some powerful features.
https://www.baeldung.com/apache-iceberg-intro
This tutorial will discuss Apache Iceberg, a popular open table format in today’s big data landscape.
https://www.youtube.com/watch?v=TsmhRZElPvM
You’ve probably heard about Apache Iceberg™—after all, it’s been getting a lot of buzz.
Hudi+
[英文] Spark Quick Start
https://hudi.apache.org/docs/quick-start-guide
we will walk through code snippets that allows you to insert, update, delete and query a Hudi table.
https://www.oreilly.com/library/view/apache-hudi-the/9781098173821/
Overcome challenges in building transactional guarantees on rapidly changing data by using Apache Hudi.
https://www.youtube.com/watch?v=pyK18sDYnS0
In this video, I'll introduce you to one of the most popular Data Lake solutions out there, Apache Hudi!
Kafka+
https://developer.confluent.io/what-is-apache-kafka/
https://www.youtube.com/watch?v=CU44hKLMg7k
https://www.youtube.com/watch?v=j4bqyAMMb7o&list=PLa7VYi0yPIH0KbnJQcMv5N9iW8HkZHztH
In this Apache Kafka fundamentals course, we introduce you to the basic Apache Kafka elements and APIs, as well as the broader Kafka ecosystem.
还有更多 •••