拼多多搜索引擎工程师
社招全职技术类地点:上海状态:招聘
任职要求
1、具备扎实的计算机理论基础和良好的编程功底; 2、熟悉Java/C++等开发语言,熟悉Linux工作环境; 3、熟悉Redis/HBase/MongoDB/ElasticSearch/Spark/Flink等,具有任一框架优化经验者更佳; 4、具有底层基础库优化经验者优先,包括但不限于JVM、CPU/GPU、OpenBLAS/MKL等; 加分项 1、大型互联网系统(广告、搜索、推荐)开发经验者优先; 2、熟悉分布式计算、存储、机器学习系统加分,如es、k8s、Hadoop、MPI、Spark、Flink、tensorflow、HBase等系统。
工作职责
1、参与电商搜索服务架构设计和开发,打造高性能、高可用、可扩展的架构,支撑业务快速发展; 2、参与高并发搜索引擎的全链路研发,包括检索、打分、排序等相关后端服务架构,优化搜索引擎的计算和存储性能; 3、开发搜索基础组件,包括索引库、检索服务、服务框架、资源管控等,提高系统的性能和扩展性;
包括英文材料
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
C+++
https://www.learncpp.com/
LearnCpp.com is a free website devoted to teaching you how to program in modern C++.
https://www.youtube.com/watch?v=ZzaPdXTrSb8
Linux+
https://ryanstutorials.net/linuxtutorial/
Ok, so you want to learn how to use the Bash command line interface (terminal) on Unix/Linux.
https://ubuntu.com/tutorials/command-line-for-beginners
The Linux command line is a text interface to your computer.
https://www.youtube.com/watch?v=6WatcfENsOU
In this Linux crash course, you will learn the fundamental skills and tools you need to become a proficient Linux system administrator.
https://www.youtube.com/watch?v=v392lEyM29A
Never fear the command line again, make it fear you.
https://www.youtube.com/watch?v=ZtqBQ68cfJc
Redis+
[英文] Developer Hub
https://redis.io/dev/
Get all the tutorials, learning paths, and more you need to start building—fast.
https://www.runoob.com/redis/redis-tutorial.html
REmote DIctionary Server(Redis) 是一个由 Salvatore Sanfilippo 写的 key-value 存储系统,是跨平台的非关系型数据库。
https://www.youtube.com/watch?v=jgpVdJB2sKQ
In this video I will be covering Redis in depth from how to install it, what commands you can use, all the way to how to use it in a real world project.
HBase+
[英文] HBase Tutorial
https://www.tutorialspoint.com/hbase/index.htm
HBase is a data model that is similar to Google's big table designed to provide quick random access to huge amounts of structured data. This tutorial provides an introduction to HBase, the procedures to set up HBase on Hadoop File Systems, and ways to interact with HBase shell.
MongoDB+
https://learnxinyminutes.com/mongodb/
MongoDB is a NoSQL document database for high volume data storage.
https://studio3t.com/academy/#courses
The fastest way to learn MongoDB
https://www.youtube.com/watch?v=c2M-rlkkT5o
This video will give you and introduction to MongoDB in 1 Hour. Afterwards I recommend exploring aggregation, replication, and sharding.
https://www.youtube.com/watch?v=ExcRbA7fy_A&list=PL4cUxeGkcC9h77dJ-QJlwGlZlTd4ecZOA
You'll learn how to use MongoDB (a NoSQL database) from scratch. You'll also learn how to integrate it into a simple Node.js API.
ElasticSearch+
https://www.youtube.com/watch?v=a4HBKEda_F8
Learn about Elasticsearch with this comprehensive course designed for beginners, featuring both theoretical concepts and hands-on applications using Python (though applicable to any programming language). The course is structured in two parts: first covering essential Elasticsearch fundamentals including index management, document storage, text analysis, pipeline creation, search functionality, and advanced features like semantic search and embeddings; followed by a practical section where you'll build a real-world website using Elasticsearch as a search engine, working with the Astronomy Picture of the Day (APOD) dataset to implement features such as data cleaning pipelines, tokenization, pagination, and aggregations.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
JVM+
https://www.freecodecamp.org/news/jvm-tutorial-java-virtual-machine-architecture-explained-for-beginners/
https://www.youtube.com/watch?v=e2zmmkc5xI0
机器学习+
https://www.youtube.com/watch?v=0oyDqO8PjIg
Learn about machine learning and AI with this comprehensive 11-hour course from @LunarTech_ai.
https://www.youtube.com/watch?v=i_LwzRVP7bg
Learn Machine Learning in a way that is accessible to absolute beginners.
https://www.youtube.com/watch?v=NWONeJKn6kc
Learn the theory and practical application of machine learning concepts in this comprehensive course for beginners.
https://www.youtube.com/watch?v=PcbuKRNtCUc
Learn about all the most important concepts and terms related to machine learning and AI.
Kubernetes+
https://kubernetes.io/docs/tutorials/kubernetes-basics/
This tutorial provides a walkthrough of the basics of the Kubernetes cluster orchestration system.
https://kubernetes.io/zh-cn/docs/tutorials/kubernetes-basics/
本教程介绍 Kubernetes 集群编排系统的基础知识。每个模块包含关于 Kubernetes 主要特性和概念的一些背景信息,还包括一个在线教程供你学习。
https://www.youtube.com/watch?v=s_o8dwzRlu4
Hands-On Kubernetes Tutorial | Learn Kubernetes in 1 Hour - Kubernetes Course for Beginners
https://www.youtube.com/watch?v=X48VuDVv0do
Full Kubernetes Tutorial | Kubernetes Course | Hands-on course with a lot of demos
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
TensorFlow+
https://www.youtube.com/watch?v=tpCFfeUEGs8
Ready to learn the fundamentals of TensorFlow and deep learning with Python? Well, you’ve come to the right place.
https://www.youtube.com/watch?v=ZUKz4125WNI
This part continues right where part one left off so get that Google Colab window open and get ready to write plenty more TensorFlow code.
Message Passing Interface+
https://www.youtube.com/watch?v=7huftuXExV0
Parallel programming and MPI are crucial tools for achieving high performance computing.
[英文] 📺Basics of the Message Passing Interface (MPI) to program distributed memory parallel computers
https://www.youtube.com/watch?v=tm8M5H1OZmw
The Message Passing Interface (MPI) is a widely used standard to program distributed message parallel computers.
相关职位
校招J1004
1、参与设计和建设搜索百亿规模的倒排索引和语义索引检索系统,并优化系统稳定性和性能; 2、参与设计和建设搜索的策略和模型引擎,迭代搜索个性化机制,设计灵活的策略架构支持搜索算法的快速迭代; 3、参与搜索模型迭代相关的工作,包括模型训练、预测、数据流等全链路流程。
更新于 2025-08-08
社招3年以上技术类
1、负责电商搜索服务架构设计,打造高性能、高可用、可扩展的架构,支撑业务快速发展; 2、负责高并发搜索引擎的全链路架构,包括检索、打分、排序等相关后端服务架构,优化搜索引擎的计算和存储性能; 3、设计和开发搜索基础组件,包括索引库、检索服务、服务框架、资源管控等,提高系统的性能和扩展性; 4、深刻的理解业务,抽象和设计合理的技术架构,以适应不断变化的需求;
更新于 2025-09-08
社招MEG
负责打造下一代高效、智能的推荐后端引擎,在实现高性能、高可用、可扩展的同时持续满足业务增长需求 工作职责: -支持百度搜索中的实体推荐、query推荐、sug和相关搜索、内容推荐等产品的升级、换新 -负责搜索场景的高并发推荐系统架构的设计、研发与优化,确保系统的高可用、可扩展和高性能 -通过处理大规模数据、运用统计学、机器学习等技术提升搜索质量和用户体验 -跟踪搜索引擎领域的最新技术趋势,如引擎工程架构、AI原生等,并应用于产品中
更新于 2025-05-06