字节跳动多模态数据湖研发专家-Data for AI
社招全职A174521A地点:北京状态:招聘
任职要求
1、本科及以上学历,计算机相关专业; 2、拥有扎实的计算机基础和良好的编程能力,精通Java或Python语言,熟悉主流的Java或Python编程框架; 3、熟悉K8S工作原理、云原生AI/大数据生态常用的开源组件,或熟悉RAY、Spark等分布式数据处理框架者优先; 4、有深度学习、大模型、LLM应用等平台开发经验,或熟悉LangChain、Dify等开源Agent框架者优先; 5、熟悉大模型领域数据处理流程/算法者优先。
工作职责
1、打造业界领先的大数据+AI云平台产品,满足大模型时代的数据处理及应用需求; 2、负责云平台的架构设计和研发,包括数据处理、资源调度、模型/算子管理、模型部署服务等; 3、基于K8S体系构建大规模任务处理系统,并负责GPU、CPU等多种异构资源的编排调度优化; 4、负责云平台与字节跳动火山引擎基础设施如计算、存储、AI模型等上下游生态的集成。
包括英文材料
学历+
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
Kubernetes+
https://kubernetes.io/docs/tutorials/kubernetes-basics/
This tutorial provides a walkthrough of the basics of the Kubernetes cluster orchestration system.
https://kubernetes.io/zh-cn/docs/tutorials/kubernetes-basics/
本教程介绍 Kubernetes 集群编排系统的基础知识。每个模块包含关于 Kubernetes 主要特性和概念的一些背景信息,还包括一个在线教程供你学习。
https://www.youtube.com/watch?v=s_o8dwzRlu4
Hands-On Kubernetes Tutorial | Learn Kubernetes in 1 Hour - Kubernetes Course for Beginners
https://www.youtube.com/watch?v=X48VuDVv0do
Full Kubernetes Tutorial | Kubernetes Course | Hands-on course with a lot of demos
大数据+
https://www.youtube.com/watch?v=bAyrObl7TYE
https://www.youtube.com/watch?v=H4bf_uuMC-g
With all this talk of Big Data, we got Rebecca Tickle to explain just what makes data into Big Data.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
深度学习+
https://d2l.ai/
Interactive deep learning book with code, math, and discussions.
大模型+
https://www.youtube.com/watch?v=xZDB1naRUlk
You will build projects with LLMs that will enable you to create dynamic interfaces, interact with vast amounts of text data, and even empower LLMs with the capability to browse the internet for research papers.
https://www.youtube.com/watch?v=zjkBMFhNj_g
LangChain+
https://python.langchain.com/docs/tutorials/
New to LangChain or LLM app development in general? Read this material to quickly get up and running building your first applications.
https://www.freecodecamp.org/news/beginners-guide-to-langchain/
LangChain is a popular framework for creating LLM-powered apps.
AI agent+
https://www.ibm.com/think/ai-agents
Your one-stop resource for gaining in-depth knowledge and hands-on applications of AI agents.
算法+
https://roadmap.sh/datastructures-and-algorithms
Step by step guide to learn Data Structures and Algorithms in 2025
https://www.hellointerview.com/learn/code
A visual guide to the most important patterns and approaches for the coding interview.
https://www.w3schools.com/dsa/
相关职位
社招2年以上A38455
1、负责多模态数据湖内核与存储引擎的研发工作,在Data+AI场景提供行业数据湖解决方案; 2、负责与上层数据处理产品深度联动,建设多模数据湖生态; 3、结合字节跳动、国内头部大模型客户场景,支持多模态数据管理需求; 4、与开源社区深度合作,提升开源影响力。
更新于 2025-05-19
社招3年以上技术类-数据
1、主导团队的湖仓一体、流批一体的数据技术架构的落地,并沉淀多模态的数据处理技术框架,推进整体数据架构体系的升级; 2、参与构建企业级 Data Agent,融合 LLM 与业务数据打造智能决策大脑; 3、参与供应链平台数据资产体系的建设,包括数据领域模型的规划建设以及领域知识资产化的建设,通过数据+算法+工程化+大模型的相关能力,赋能业务与产品的自动化、智能化;
更新于 2025-08-04
社招5年以上技术-基础平台
⁃ 解决千亿级元数据索引效率的问题,支撑海量文件的查询访问效率以及元数据扩展性问题,提供性能卓越扩展良好的元数据服务。 ⁃ 高效的blob存储格式,编写高质量、可扩展、高可用、性能卓越的存储底层核心模块,对模块质量负责。 ⁃ 负责性能IO链路,对小文件聚合、大文件切片、EC编码等核心功能负责。 ⁃ 基于业界主流的AI训练加速方案,打造符合蚂蚁特点的高性能AI存储/缓存产品,为大模型、海量多模态数据的高并发训练提供存储底座,为AI算力持续增长提供基础设施侧保障。 ⁃ 负责存储产品的长期技术演进及稳定性保障,对上层业务效果负责。
更新于 2025-09-28