字节跳动大数据高级工程师-实时计算方向
社招全职X9WV地点:深圳状态:招聘
任职要求
1、对分布式计算系统有深入了解,有生产环境有TB/PB级别实时计算系统应用经验,熟练掌握多项或一项分布式计算框架,如Flink/Spark/Ray等; 2、有数据湖开发经验,熟悉Hudi、Iceberg、Paimon、Delta Lake等至少一项数据湖技术; 3、熟练掌握Java、C++、Scala、Python 等至少一项编程语言,有强悍的编码和Trouble-shooting能力; 4、有分布式计算或存储开源框架的源码经验加分; 5、有搜广推相关业务背景者优先。 加分项: 1、了解Pytorch、Ray、Transformers等AI生态,有过生产场景的实践; 2、熟悉大模型LLM、多模态等基本原理,了解LLM和推荐系统结合的常见范式; 3、有技术信仰,能持续学习业界最新的期刊。
工作职责
1、为大规模推荐系统设计和实现合理的流式计算系统; 2、设计和实现灵活可扩展、稳定、高性能存储系统和计算模型; 3、生产系统的Trouble-shooting,设计和实现必要的机制和工具保障生产系统稳定运行; 4、打造业界领先的流式计算框架等分布式系统,为海量数据和大规模业务系统提供可靠的基础设施。
包括英文材料
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Ray+
https://github.com/ray-project/ray
Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://www.youtube.com/watch?v=FhXfEXUUQp0
In this video, I'll teach you everything you need to know about Apache Ray!
https://www.youtube.com/watch?v=fMiAyj2kgac
Using powerful machine learning algorithms is easy using Ray.io and Python.
https://www.youtube.com/watch?v=q_aTbb7XeL4
Parallel and Distributed computing sounds scary until you try this fantastic Python library.
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
C+++
https://www.learncpp.com/
LearnCpp.com is a free website devoted to teaching you how to program in modern C++.
https://www.youtube.com/watch?v=ZzaPdXTrSb8
Scala+
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
PyTorch+
https://datawhalechina.github.io/thorough-pytorch/
PyTorch是利用深度学习进行数据科学研究的重要工具,在灵活性、可读性和性能上都具备相当的优势,近年来已成为学术界实现深度学习算法最常用的框架。
https://www.youtube.com/watch?v=V_xro1bcAuA
Learn PyTorch for deep learning in this comprehensive course for beginners. PyTorch is a machine learning framework written in Python.
大模型+
https://www.youtube.com/watch?v=xZDB1naRUlk
You will build projects with LLMs that will enable you to create dynamic interfaces, interact with vast amounts of text data, and even empower LLMs with the capability to browse the internet for research papers.
https://www.youtube.com/watch?v=zjkBMFhNj_g
推荐系统+
[英文] Recommender Systems
https://www.d2l.ai/chapter_recommender-systems/index.html
Recommender systems are widely employed in industry and are ubiquitous in our daily lives.
相关职位
社招A139485
团队介绍:字节跳动推荐架构国际化团队大数据方向,负责国际化推荐场景大数据的设计和开发,为业务提供标准化高性能的特征计算,保障系统稳定和高可用。 1、为大规模推荐系统设计和实现合理的数据系统; 2、生产系统的trouble-shooting,设计和实现必要的机制和工具保障生产系统整体运行的稳定性; 3、打造业界领先的流式计算框架等分布式系统,为海量数据和大规模业务系统提供可靠的基础设施。
更新于 2023-08-07
社招3年以上腾讯地图基础能力
1.负责腾讯地图动态信息挖掘与计算中的算法设计与策略开发,推动技术方案落地实施; 2.持续优化动态信息处理算法和模型,提升数据准确性和系统运行效率; 3.参与动态信息数据质量评估体系建设,制定数据优化方案; 4.跟踪行业前沿技术动态,将新技术应用于动态信息处理领域; 5.与产品、运营团队紧密协作,理解业务需求并转化为技术方案。
更新于 2025-07-23