百度向量数据库研发工程师(J72255)
社招全职1年以上ACG地点:上海状态:招聘
任职要求
-本科及以上学历,1年以上工作经验 -熟练使用C++/Go/Python等编程语言,具备良好的编码风格和工程能力 -深刻理解计算机数据结构和算法设计,了解Unix、Linux等主流操作系统原理,熟练运用系统层支持应用开发 -具备数据库、大数据、文档处理与检索、云存储等领域的研发经验,包括但不限于MongoDB、ElasticSearch、Redis、Doris、ClickHouse、云原生数据库等以及大规模检索系统、分布式存储系统等;或具备数据库云平台研发经验等 -优秀的分析问题和解决问题的能力,勇于解决难题 -强烈的上进心和求知欲,较强的学习能力和沟通能力,具备良好的团队合作精神 -热爱互联网,对互联网产品和技术有浓厚的兴趣,热衷于追求技术极致与创新 -具备严谨专业的ToB职业精神 -具有以下条件者优先:具备数据库/大数据/检索/云存储等领域相关会议文章、大赛获奖、高水平技术博客等;深入理解向量/RAG检索算法或检索优化;深入理解大模型原理并认可百度大模型者
工作职责
-负责向量数据库内核特性的设计与研发 -负责向量数据库产品云平台的设计与研发 -负责向量数据库各类向量检索、RAG检索等开发和优化 -负责向量数据库日常运维和客户支持
包括英文材料
学历+
C+++
https://www.learncpp.com/
LearnCpp.com is a free website devoted to teaching you how to program in modern C++.
https://www.youtube.com/watch?v=ZzaPdXTrSb8
Go+
https://www.youtube.com/watch?v=8uiZC0l4Ajw
学习Golang的完整教程!从开始到结束不到一个小时,包括如何在Go中构建API的完整演示。没有多余的内容,只有你需要知道的知识。
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
数据结构+
https://www.youtube.com/watch?v=8hly31xKli0
In this course you will learn about algorithms and data structures, two of the fundamental topics in computer science.
https://www.youtube.com/watch?v=B31LgI4Y4DQ
Learn about data structures in this comprehensive course. We will be implementing these data structures in C or C++.
https://www.youtube.com/watch?v=CBYHwZcbD-s
Data Structures and Algorithms full course tutorial java
算法+
https://roadmap.sh/datastructures-and-algorithms
Step by step guide to learn Data Structures and Algorithms in 2025
https://www.hellointerview.com/learn/code
A visual guide to the most important patterns and approaches for the coding interview.
https://www.w3schools.com/dsa/
Unix+
[英文] The UNIX® Standard
https://www.opengroup.org/membership/forums/platform/unix
https://www.youtube.com/watch?v=IrDUcdpPmdI
UNIX is an operating system which was first developed in the 1970s, and has been under constant development ever since.
Linux+
https://ryanstutorials.net/linuxtutorial/
Ok, so you want to learn how to use the Bash command line interface (terminal) on Unix/Linux.
https://ubuntu.com/tutorials/command-line-for-beginners
The Linux command line is a text interface to your computer.
https://www.youtube.com/watch?v=6WatcfENsOU
In this Linux crash course, you will learn the fundamental skills and tools you need to become a proficient Linux system administrator.
https://www.youtube.com/watch?v=v392lEyM29A
Never fear the command line again, make it fear you.
https://www.youtube.com/watch?v=ZtqBQ68cfJc
大数据+
https://www.youtube.com/watch?v=bAyrObl7TYE
https://www.youtube.com/watch?v=H4bf_uuMC-g
With all this talk of Big Data, we got Rebecca Tickle to explain just what makes data into Big Data.
MongoDB+
https://learnxinyminutes.com/mongodb/
MongoDB is a NoSQL document database for high volume data storage.
https://studio3t.com/academy/#courses
The fastest way to learn MongoDB
https://www.youtube.com/watch?v=c2M-rlkkT5o
This video will give you and introduction to MongoDB in 1 Hour. Afterwards I recommend exploring aggregation, replication, and sharding.
https://www.youtube.com/watch?v=ExcRbA7fy_A&list=PL4cUxeGkcC9h77dJ-QJlwGlZlTd4ecZOA
You'll learn how to use MongoDB (a NoSQL database) from scratch. You'll also learn how to integrate it into a simple Node.js API.
ElasticSearch+
https://www.youtube.com/watch?v=a4HBKEda_F8
Learn about Elasticsearch with this comprehensive course designed for beginners, featuring both theoretical concepts and hands-on applications using Python (though applicable to any programming language). The course is structured in two parts: first covering essential Elasticsearch fundamentals including index management, document storage, text analysis, pipeline creation, search functionality, and advanced features like semantic search and embeddings; followed by a practical section where you'll build a real-world website using Elasticsearch as a search engine, working with the Astronomy Picture of the Day (APOD) dataset to implement features such as data cleaning pipelines, tokenization, pagination, and aggregations.
Redis+
[英文] Developer Hub
https://redis.io/dev/
Get all the tutorials, learning paths, and more you need to start building—fast.
https://www.runoob.com/redis/redis-tutorial.html
REmote DIctionary Server(Redis) 是一个由 Salvatore Sanfilippo 写的 key-value 存储系统,是跨平台的非关系型数据库。
https://www.youtube.com/watch?v=jgpVdJB2sKQ
In this video I will be covering Redis in depth from how to install it, what commands you can use, all the way to how to use it in a real world project.
Doris+
https://doris.apache.org/docs/gettingStarted/what-is-apache-doris
ClickHouse+
[英文] Advanced Tutorial
https://clickhouse.com/docs/tutorial
Learn how to ingest and query data in ClickHouse using the New York City taxi example dataset.
https://www.youtube.com/watch?v=FtoWGT7kS-c
ClickHouse is an open-source column-oriented DBMS for online analytical processing that allows users to generate analytical reports using SQL queries in real-time.
https://www.youtube.com/watch?v=Rhe-kUyrFUE&list=PL0Z2YDlm0b3gcY5R_MUo4fT5bPqUQ66ep
信息检索+
https://nlp.stanford.edu/IR-book/information-retrieval-book.html
Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008.
RAG+
https://www.youtube.com/watch?v=sVcwVQRHIc8
Learn how to implement RAG (Retrieval Augmented Generation) from scratch, straight from a LangChain software engineer.
大模型+
https://www.youtube.com/watch?v=xZDB1naRUlk
You will build projects with LLMs that will enable you to create dynamic interfaces, interact with vast amounts of text data, and even empower LLMs with the capability to browse the internet for research papers.
https://www.youtube.com/watch?v=zjkBMFhNj_g
相关职位
社招3年以上后端开发
参与公司向量数据库的研发工作,设计研发新一代分布式向量数据库系统,支持AI、社交、搜索、推荐、广告、电商等核心业务场景 负责产品内核设计、开发测试、性能调优、管控与文档建设等全生命周期开发管理工作,针对业务发展需求进行系统演进,提供高可用、高可靠、高性价比的向量服务 学习和吸纳业界优秀的技术和理论成果,积极探索和拓展新的产品能力,持续提升产品的技术竞争力与服务水平
更新于 2025-09-13
社招A30370
1、负责分布式数据库AI Search相关功能研发,包含向量检索、全文检索、LLM Function等功能; 2、针对AI搜索场景,打造AI Native数据库,设计并实现高并发,低延时,高容错系统; 3、跟踪Data+AI前沿技术,挖掘/落地新技术的机会,包括Agent、Text2SQL等。
更新于 2025-03-19
社招A50800A
1、负责向量数据库深度定制化研发,优化分布式架构,实现高吞吐低延迟的向量检索能力; 2、负责解决海量向量数据场景下的存储引擎性能瓶颈; 3、负责火山引擎向量数据库平台建设,集成云原生,构建稳定性链路,拓展生态; 4、针对向量检索业务场景下的特定需求,提出并推动解决方案落地。
更新于 2025-03-05