拼多多搜广推数据开发工程师
社招全职技术类地点:上海状态:招聘
任职要求
1、精通数据仓库模型设计,具备丰富的ETL开发及海量数据加工处理经验; 2、具备丰富的Flink实时计算开发与调优经验,精通高并发、高可用、可扩展分布式系统设计原则与实践; 3、具备分布式数据存储与计算平台(Hadoop生态)应用开发经…
登录查看完整任职要求
微信扫码,1秒登录
工作职责
1、负责电商搜索、广告、推荐等核心业务数据体系及相关工具平台的规划、建设与持续优化,高效支持算法、数据分析、工程等团队的数据需求; 2、深入理解业务逻辑,抽象业务需求并设计可扩展、高性能的数据技术架构,快速响应业务变化,构建高效、可靠的数据互通与共享机制; 3、负责数据处理链路(离线/实时)的日常运维、监控与保障,确保数据稳定、高效产出,及时解决数据问题。
包括英文材料
数据仓库+
https://www.youtube.com/watch?v=9GVqKuTVANE
From Zero to Data Warehouse Hero: A Full SQL Project Walkthrough and Real Industry Experience!
https://www.youtube.com/watch?v=k4tK2ttdSDg
ETL+
https://www.ibm.com/think/topics/etl
ETL—meaning extract, transform, load—is a data integration process that combines, cleans and organizes data from multiple sources into a single, consistent data set for storage in a data warehouse, data lake or other target system.
https://www.youtube.com/watch?v=OW5OgsLpDCQ
It explains what ETL is and what it can do for you to improve your data analysis and productivity.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
高并发+
https://www.baeldung.com/concurrency-principles-patterns
In this tutorial, we’ll discuss some of the design principles and patterns that have been established over time to build highly concurrent applications.
https://www.baeldung.com/java-concurrency
Handling concurrency in an application can be a tricky process with many potential pitfalls. A solid grasp of the fundamentals will go a long way to help minimize these issues.
https://www.oreilly.com/library/view/concurrency-in-go/9781491941294/
You’ll understand how Go chooses to model concurrency, what issues arise from this model, and how you can compose primitives within this model to solve problems.
https://www.oreilly.com/library/view/modern-concurrency-in/9781098165406/
With this book, you'll explore the transformative world of Java 21's key feature: virtual threads.
https://www.youtube.com/watch?v=qyM8Pi1KiiM
https://www.youtube.com/watch?v=wEsPL50Uiyo
高可用+
https://redis.io/blog/high-availability-architecture/
A high available architecture is when there are a number of different components, modules, or services that work together to maintain optimal performance, irrespective of peak-time loads.
https://www.ibm.com/think/topics/high-availability
High availability (HA) is a term that refers to a system’s ability to be accessible and reliable close to 100% of the time.
还有更多 •••
相关职位
社招3-5年引擎
1、负责模型样本&特征平台开发,为搜广推算法模型提供高效稳定的学习能力 ; 2、负责搜广推索引传输系统优化和迭代,优化提高系统吞吐、降低延迟、提升迭代效率 ; 3、支持大规模搜广推数据的流式/批量处理 ;
更新于 2025-09-02北京|上海
校招后端开发
1、负责模型样本&特征平台开发,为搜广推算法模型提供高效稳定的学习能力 ; 2、负责搜广推索引传输系统优化和迭代,优化提高系统吞吐、降低延迟、提升迭代效率 ; 3、支持大规模搜广推数据的流式/批量处理 。
更新于 2025-08-30上海|北京
校招后端开发
1、负责模型样本&特征平台开发,为搜广推算法模型提供高效稳定的学习能力 ; 2、负责搜广推索引传输系统优化和迭代,优化提高系统吞吐、降低延迟、提升迭代效率 ; 3、支持大规模搜广推数据的流式/批量处理 。
更新于 2025-08-30上海|北京