拼多多大数据系统开发工程师(高级)
社招全职3年以上技术类地点:上海状态:招聘
任职要求
1、计算机或相关专业,3年以上的服务端开发经验; 2、对算法和数据结构有深刻的理解,C/C++或者Java基本功扎实; 3、具有大规模、高并发开发和优化能力,或者良好的架构设计能力,有相关系统经验; 4、有良好的沟通和逻辑思维能力,有足够的技术和产品好奇心,善于学习,善于分析和解决实际问题; 加分项 1、有搜索、推荐或电商业务相关经验加分; 2、熟悉部分分布式计算、存储系统加分,如Hadoop、Hive、Impala、HBase、TiDB、RocksDB、Spark、Flink、Kafka等系统;
工作职责
1、负责电商搜索&推荐服务离线大数据架构工作,研发、优化在线&离线数据流程; 2、设计开发模型样本数据生成、用户行为画像、高性能在线用户画像服务等业务,开发离线高性能框架、在线高性能KV系统; 3、深刻的理解业务,抽象和设计合理的技术架构,以适应不断变化的需求; 4、调研业内先进系统,打造领先的技术架构,高效支撑业务;
包括英文材料
算法+
https://roadmap.sh/datastructures-and-algorithms
Step by step guide to learn Data Structures and Algorithms in 2025
https://www.hellointerview.com/learn/code
A visual guide to the most important patterns and approaches for the coding interview.
https://www.w3schools.com/dsa/
数据结构+
https://www.youtube.com/watch?v=8hly31xKli0
In this course you will learn about algorithms and data structures, two of the fundamental topics in computer science.
https://www.youtube.com/watch?v=B31LgI4Y4DQ
Learn about data structures in this comprehensive course. We will be implementing these data structures in C or C++.
https://www.youtube.com/watch?v=CBYHwZcbD-s
Data Structures and Algorithms full course tutorial java
C+
https://www.freecodecamp.org/chinese/news/the-c-beginners-handbook/
本手册遵循二八定律。你将在 20% 的时间内学习 80% 的 C 编程语言。
https://www.youtube.com/watch?v=87SH2Cn0s9A
https://www.youtube.com/watch?v=KJgsSFOSQv0
This course will give you a full introduction into all of the core concepts in the C programming language.
https://www.youtube.com/watch?v=PaPN51Mm5qQ
In this complete C programming course, Dr. Charles Severance (aka Dr. Chuck) will help you understand computer architecture and low-level programming with the help of the classic C Programming language book written by Brian Kernighan and Dennis Ritchie.
C+++
https://www.learncpp.com/
LearnCpp.com is a free website devoted to teaching you how to program in modern C++.
https://www.youtube.com/watch?v=ZzaPdXTrSb8
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
高并发+
https://www.baeldung.com/concurrency-principles-patterns
In this tutorial, we’ll discuss some of the design principles and patterns that have been established over time to build highly concurrent applications.
https://www.baeldung.com/java-concurrency
Handling concurrency in an application can be a tricky process with many potential pitfalls. A solid grasp of the fundamentals will go a long way to help minimize these issues.
https://www.oreilly.com/library/view/concurrency-in-go/9781491941294/
You’ll understand how Go chooses to model concurrency, what issues arise from this model, and how you can compose primitives within this model to solve problems.
https://www.oreilly.com/library/view/modern-concurrency-in/9781098165406/
With this book, you'll explore the transformative world of Java 21's key feature: virtual threads.
https://www.youtube.com/watch?v=qyM8Pi1KiiM
https://www.youtube.com/watch?v=wEsPL50Uiyo
系统设计+
https://roadmap.sh/system-design
Everything you need to know about designing large scale systems.
https://www.youtube.com/watch?v=F2FmTdLtb_4
This complete system design tutorial covers scalability, reliability, data handling, and high-level architecture with clear explanations, real-world examples, and practical strategies.
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
HBase+
[英文] HBase Tutorial
https://www.tutorialspoint.com/hbase/index.htm
HBase is a data model that is similar to Google's big table designed to provide quick random access to huge amounts of structured data. This tutorial provides an introduction to HBase, the procedures to set up HBase on Hadoop File Systems, and ways to interact with HBase shell.
TiDB+
RocksDB+
https://rocksdb.org/docs/getting-started.html
The RocksDB library provides a persistent key value store.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
Kafka+
https://developer.confluent.io/what-is-apache-kafka/
https://www.youtube.com/watch?v=CU44hKLMg7k
https://www.youtube.com/watch?v=j4bqyAMMb7o&list=PLa7VYi0yPIH0KbnJQcMv5N9iW8HkZHztH
In this Apache Kafka fundamentals course, we introduce you to the basic Apache Kafka elements and APIs, as well as the broader Kafka ecosystem.
Impala+
[英文] Impala Tutorials
https://impala.apache.org/docs/build/html/topics/impala_tutorial.html
This section includes tutorial scenarios that demonstrate how to begin using Impala.
相关职位
社招5年以上软硬件服务-充电
1、基于美团的数据平台进行离线和实时数据仓库建设,数据分析以及预测。 2、梳理业务系统数据,进行数据模型设计和开发,产出支持业务分析的基础数据,保障数据的准确性、易用性、及时性。 3、负责业务的数据需求、数据报表、OLAP开发以及临时数据提取的开发任务 4、参与技术决策和技术选型,制定流程规范,完善数据质量监控和数据治理。 5、针对海量IoT数据进行数据处理和模型训练,提升健康运维的效率。
更新于 2025-06-20
社招5年以上软硬件服务-充电
1、基于美团的数据平台进行离线和实时数据仓库建设,数据分析以及预测。 2、梳理业务系统数据,进行数据模型设计和开发,产出支持业务分析的基础数据,保障数据的准确性、易用性、及时性。 3、负责业务的数据需求、数据报表、OLAP开发以及临时数据提取的开发任务 4、参与技术决策和技术选型,制定流程规范,完善数据质量监控和数据治理。 5、针对海量IoT数据进行数据处理和模型训练,提升健康运维的效率。
更新于 2025-06-19