百度大数据研发工程师(J82413)
社招全职ACG地点:北京状态:招聘
任职要求
-对互联网产品和技术有浓厚的兴趣,热衷于追求技术极致与创新,具备良好的团队合作精神 -深刻理解计算机数据结构和算法设计,熟练使用Java/Python/C++/Go语言中的一种或几种 -有大模型数据基础设施和平台的研发经验优先 -具备专业领域的计算机知识和技能优先:Spark、Ray、Flink、Doris、大模型数据工程等
工作职责
-设计、实现和维护数据基础设施系统,如分布式计算、数据编排、分布式存储、流式计算,同时确保可扩展性、可靠性和安全性 -确保我们的数据平台能够可靠地扩展到下一个数量级,满足业务对海量数据的计算/存储/检索/分析需求 - 建设面向大模型的数据基础设施平台,不断提升数据生产效率和数据质量,支撑大模型高效训练和性能优化
包括英文材料
数据结构+
https://www.youtube.com/watch?v=8hly31xKli0
In this course you will learn about algorithms and data structures, two of the fundamental topics in computer science.
https://www.youtube.com/watch?v=B31LgI4Y4DQ
Learn about data structures in this comprehensive course. We will be implementing these data structures in C or C++.
https://www.youtube.com/watch?v=CBYHwZcbD-s
Data Structures and Algorithms full course tutorial java
算法+
https://roadmap.sh/datastructures-and-algorithms
Step by step guide to learn Data Structures and Algorithms in 2025
https://www.hellointerview.com/learn/code
A visual guide to the most important patterns and approaches for the coding interview.
https://www.w3schools.com/dsa/
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
C+++
https://www.learncpp.com/
LearnCpp.com is a free website devoted to teaching you how to program in modern C++.
https://www.youtube.com/watch?v=ZzaPdXTrSb8
Go+
https://www.youtube.com/watch?v=8uiZC0l4Ajw
学习Golang的完整教程!从开始到结束不到一个小时,包括如何在Go中构建API的完整演示。没有多余的内容,只有你需要知道的知识。
大模型+
https://www.youtube.com/watch?v=xZDB1naRUlk
You will build projects with LLMs that will enable you to create dynamic interfaces, interact with vast amounts of text data, and even empower LLMs with the capability to browse the internet for research papers.
https://www.youtube.com/watch?v=zjkBMFhNj_g
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Ray+
https://github.com/ray-project/ray
Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://www.youtube.com/watch?v=FhXfEXUUQp0
In this video, I'll teach you everything you need to know about Apache Ray!
https://www.youtube.com/watch?v=fMiAyj2kgac
Using powerful machine learning algorithms is easy using Ray.io and Python.
https://www.youtube.com/watch?v=q_aTbb7XeL4
Parallel and Distributed computing sounds scary until you try this fantastic Python library.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
Doris+
https://doris.apache.org/docs/gettingStarted/what-is-apache-doris
相关职位
社招3年以上TPG
-负责百度企业数据的离线和实时数据仓库、数据湖建设; -负责企业数据的ETL系统、数据存储与指标体系建设; -负责企业人力资源管理、财务经营数据等业务领域的数据模型建设; -结合大模型/智能体技术建设企业数据的智能化洞察与分析能力;
更新于 2025-06-12
社招A140437
1、广告各类在线业务的离线数据加工与在线数据服务开发与维护; 2、数据服务接口及产品需求研发迭代,代码review、bug修复及日常服务运维; 3、针对海量数据处理和查询需求,设计适应业务变化的合理的多维数据分析系统架构,满足多样性的需求; 4、海量日志清洗加工,并抽象出可以多业务复用的数据模型。
更新于 2023-10-20
社招3年以上J6NQP
1、负责抖音/抖音火山版等多个业务线的策略算法建设与优化工作; 2、通过海量数据,分析与挖掘各种潜在关联,不断优化策略效果,保障用户体验; 3、负责实时及离线特征抽取、融合,为数据挖掘及策略平台提供特征服务; 4、负责大数据能力在产品功能上的落地,推动产品数据化和智能化。
更新于 2021-01-19