字节跳动大数据开发专家 - 数据集成方向
社招全职H5411A地点:深圳状态:招聘
任职要求
1、扎实的计算机基础和算法数据结构功底,对技术有热情,愿意不断尝试新技术和业务挑战; 2、掌握Java或Scala语言,如并发编程和JVM等,追求高标准的工程质量; 3、熟悉并行计算或者分布式计算原理,熟悉高并发、高稳定性、可线性扩展、海量数据的系统特点和技术方案; 4、具备较强的业务需求分析能力,问题定位能力良好的沟通能力和自我学习能力; 5、有Storm/SparkStreaming/Flink等实时计算开发经验,向社区贡献过 patch 者优先(请注明); 6、熟悉Hadoop EcoSystem/Kafka/Clickhouse等技术者优先。
工作职责
1、负责数据平台下实时数据集成业务的计算架构设计与开发; 2、负责对实时数据集成服务的性能和稳定性进行优化; 3、参与Flink内核的定制和改进,与开源社区保持合作; 4、规划数据集成和数据湖等技术方向,培养发展技术团队,输出行业级别影响力。
包括英文材料
算法+
https://roadmap.sh/datastructures-and-algorithms
Step by step guide to learn Data Structures and Algorithms in 2025
https://www.hellointerview.com/learn/code
A visual guide to the most important patterns and approaches for the coding interview.
https://www.w3schools.com/dsa/
数据结构+
https://www.youtube.com/watch?v=8hly31xKli0
In this course you will learn about algorithms and data structures, two of the fundamental topics in computer science.
https://www.youtube.com/watch?v=B31LgI4Y4DQ
Learn about data structures in this comprehensive course. We will be implementing these data structures in C or C++.
https://www.youtube.com/watch?v=CBYHwZcbD-s
Data Structures and Algorithms full course tutorial java
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
Scala+
JVM+
https://www.freecodecamp.org/news/jvm-tutorial-java-virtual-machine-architecture-explained-for-beginners/
https://www.youtube.com/watch?v=e2zmmkc5xI0
高并发+
https://www.baeldung.com/concurrency-principles-patterns
In this tutorial, we’ll discuss some of the design principles and patterns that have been established over time to build highly concurrent applications.
https://www.baeldung.com/java-concurrency
Handling concurrency in an application can be a tricky process with many potential pitfalls. A solid grasp of the fundamentals will go a long way to help minimize these issues.
https://www.oreilly.com/library/view/concurrency-in-go/9781491941294/
You’ll understand how Go chooses to model concurrency, what issues arise from this model, and how you can compose primitives within this model to solve problems.
https://www.oreilly.com/library/view/modern-concurrency-in/9781098165406/
With this book, you'll explore the transformative world of Java 21's key feature: virtual threads.
https://www.youtube.com/watch?v=qyM8Pi1KiiM
https://www.youtube.com/watch?v=wEsPL50Uiyo
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
Kafka+
https://developer.confluent.io/what-is-apache-kafka/
https://www.youtube.com/watch?v=CU44hKLMg7k
https://www.youtube.com/watch?v=j4bqyAMMb7o&list=PLa7VYi0yPIH0KbnJQcMv5N9iW8HkZHztH
In this Apache Kafka fundamentals course, we introduce you to the basic Apache Kafka elements and APIs, as well as the broader Kafka ecosystem.
ClickHouse+
[英文] Advanced Tutorial
https://clickhouse.com/docs/tutorial
Learn how to ingest and query data in ClickHouse using the New York City taxi example dataset.
https://www.youtube.com/watch?v=FtoWGT7kS-c
ClickHouse is an open-source column-oriented DBMS for online analytical processing that allows users to generate analytical reports using SQL queries in real-time.
https://www.youtube.com/watch?v=Rhe-kUyrFUE&list=PL0Z2YDlm0b3gcY5R_MUo4fT5bPqUQ66ep
相关职位
社招A100514
1、负责数据平台下实时数据集成业务的计算架构设计与开发; 2、负责对实时数据集成服务的性能和稳定性进行优化; 3、参与Flink内核的定制和改进,与开源社区保持合作; 4、规划数据集成和数据湖等技术方向,培养发展技术团队,输出行业级别影响力。
更新于 2023-11-08
社招A183303
1、负责火山引擎全域数据集成产品的架构设计与开发; 2、负责数据集成Connector扩展、性能和稳定性进行优化; 3、研发批流一体的实时传输引擎,提升内外客户的数据集成效率,打造业界领先的集成平台。
更新于 2023-06-05
社招5年以上技术类-开发
1、参与针对企业数据安全治理领域的数据安全中心相关研发工作,实现大数据/AI一体化的数据安全治理,实现统一的数据安全性以及风险与合规性管理; 2、参与解决企业数据安全治理领域面临的痛点,设计并实现相关安全解决方案,解决客户在数据集成、数据开发分析等大数据治理过程中的安全问题; 3、参与数据安全中心的系统架构设计与演进迭代,持续优化系统安全性、稳定性、可扩展性、性能,以及使用体验,满足大数据/AI持续发展的业务形态与规模对数据安全的需求; 4、参与企业数据安全治理领域的技术动向研究,实现与业务贴合的安全能力技术攻坚,进行技术上的前瞻探索,实现面向未来的数据安全中心的规划、设计和落地,保持在企业数据安全治理领域的技术先进性。
更新于 2025-06-16