百度数据研发工程师(J85413)
社招全职3年以上TPG地点:北京状态:招聘
任职要求
-从事数据仓库或大数据领域相关开发3年及以上经验; -熟悉Hadoop/Spark/Hive/Doris/GreenPlum等大数据生态技术; -熟悉scala、python、java等开发语言当中的一种或多种; -有数据采集、大数据处理、数据仓库建模、ETL系统设计等相关开发或优化经验; -熟悉BI系统,有很高的数据敏感度和质量把控意识,有HR数据开发或数据分析智能化应用相关经验者优先; -熟悉至少一种关系型数据库(Oracle/MySQL/PostgreSQL),熟练掌握SQL,有一定的SQL调优能力; -务实自驱、热爱数据、喜欢钻研、有良好的沟通与团队协作能力;
工作职责
-负责百度企业数据的离线和实时数据仓库、数据湖建设; -负责企业数据的ETL系统、数据存储与指标体系建设; -负责企业人力资源管理、财务经营数据等业务领域的数据模型建设; -结合大模型/智能体技术建设企业数据的智能化洞察与分析能力;
包括英文材料
数据仓库+
https://www.youtube.com/watch?v=9GVqKuTVANE
From Zero to Data Warehouse Hero: A Full SQL Project Walkthrough and Real Industry Experience!
https://www.youtube.com/watch?v=k4tK2ttdSDg
大数据+
https://www.youtube.com/watch?v=bAyrObl7TYE
https://www.youtube.com/watch?v=H4bf_uuMC-g
With all this talk of Big Data, we got Rebecca Tickle to explain just what makes data into Big Data.
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
Doris+
https://doris.apache.org/docs/gettingStarted/what-is-apache-doris
Scala+
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
ETL+
https://www.ibm.com/think/topics/etl
ETL—meaning extract, transform, load—is a data integration process that combines, cleans and organizes data from multiple sources into a single, consistent data set for storage in a data warehouse, data lake or other target system.
https://www.youtube.com/watch?v=OW5OgsLpDCQ
It explains what ETL is and what it can do for you to improve your data analysis and productivity.
系统设计+
https://roadmap.sh/system-design
Everything you need to know about designing large scale systems.
https://www.youtube.com/watch?v=F2FmTdLtb_4
This complete system design tutorial covers scalability, reliability, data handling, and high-level architecture with clear explanations, real-world examples, and practical strategies.
数据分析+
[英文] Data Analyst Roadmap
https://roadmap.sh/data-analyst
Step by step guide to becoming an Data Analyst in 2025
Oracle+
[英文] Oracle Tutorial
https://www.oracletutorial.com/
On this website, you can learn Oracle Database fast and easily.
https://www.youtube.com/watch?v=QHYuuXPdQNM&list=PL_c9BZzLwBRJ8f9-pSPbxSSG6lNgxQ4m9
SQL+
https://liaoxuefeng.com/books/sql/introduction/index.html
什么是SQL?简单地说,SQL就是访问和处理关系数据库的计算机标准语言。
https://sqlbolt.com/
Learn SQL with simple, interactive exercises.
https://www.youtube.com/watch?v=p3qvj9hO_Bo
In this video we will cover everything you need to know about SQL in only 60 minutes.
MySQL+
https://juejin.cn/post/7190306988939542585
这是一篇 MySQL 通关一篇过硬核经验学习路线,包括数据库相关知识,SQL语句的使用,数据库约束,设计等。
[英文] MySQL Tutorial
https://www.mysqltutorial.org/
your go-to resource for mastering MySQL in a fast, easy, and enjoyable way.
https://www.youtube.com/watch?v=5OdVJbNCSso
MySQL SQL tutorial for beginners
https://www.youtube.com/watch?v=7S_tz1z_5bA
This beginner-friendly course teaches you SQL from scratch.
PostgreSQL+
[英文] PostgreSQL Tutorial
https://neon.com/postgresql/tutorial
This PostgreSQL tutorial helps you quickly understand PostgreSQL.
[英文] PostgreSQL Tutorial
https://www.pgtutorial.com/
This PostgreSQL tutorial will teach you about PostgreSQL from beginner to advanced.
https://www.youtube.com/watch?v=qw--VYLpxG4
It is the most advanced open source database system widely used to build back-end systems.
https://www.youtube.com/watch?v=SpfIwlAYaKk
Learn PostgreSQL, one of the world's most advanced and robust open-source relational database systems.
相关职位
社招ACG
-设计、实现和维护数据基础设施系统,如分布式计算、数据编排、分布式存储、流式计算,同时确保可扩展性、可靠性和安全性 -确保我们的数据平台能够可靠地扩展到下一个数量级,满足业务对海量数据的计算/存储/检索/分析需求 - 建设面向大模型的数据基础设施平台,不断提升数据生产效率和数据质量,支撑大模型高效训练和性能优化
更新于 2025-02-13
校招AIDU项目
-负责AI原生应用的服务端研发工作; -参与服务架构设计,独立完成业务需求分析和软件设计; -负责AI前沿技术的调研、项目方案设计与优化,确保技术在项目中成功应用; -负责线上复杂性并发问题的解决。
更新于 2025-05-19
社招3年以上A92288
1.负责小米互联网电视和视频等业务数据仓库架构设计、标准化埋点、数据建模和ETL开发; 2.参与数据治理工作,提升数据易用性及数据质量,与数据平台团队紧密合作; 3.理解并合理抽象业务需求,解决服务的业务问题,与业务团队紧密合作; 4.跟踪业界先进的数据相关技术栈和解决方案。
更新于 2025-02-06
实习阿里云2026届
阿里云持续推进AI 技术深化战略布局, 围绕AI 和云计算的基础设施建设、AI基础模型平台、企业级AI应用方向构建核心场景。为此,我们正积极招募优秀人才: 如果你想参与阿里云大数据的采集、存储、处理,通过分布式大数据平台加工数据,支持业务管理决策; 如果你想参与阿里云大数据体系的模型设计、开发、维护,通过元数据、质量体系有效的管理和组织EB级的数据; 如果你想参与阿里云大数据产品的研发,发挥你的商业sense,通过数据分析和算法来洞察数据背后的机会,来探索大数据商业化; 如果你想接触世界领先的大数据处理与应用的技术和平台,获得大数据浪潮之巅的各类大牛的指导; 那就加入我们吧!
更新于 2025-06-17