快手【留用实习】C++向量检索平台工程师
实习兼职J1013地点:北京状态:招聘
任职要求
1、本科及以上学历,计算机科学与技术、软件工程或相关专业方向; 2、熟练掌握C/C++,熟悉Python,熟悉Linux开发环境及内核,熟练掌握常用数据结构及算法,对计算机体系结构、操作系统及计算机网络有深入理解; 3、具有一定的算法基础,对机器学习,深度学习、多模态表征算法等有一定的基础; 4、熟悉分布式系统架构及实现,熟练使用常见分布式平台如Hadoop,Flink,Spark,Hive,HBase等,对源码有一定了解者优先;对faiss、milvus或其他向量索引算法有了解者优先 ; 5、有比较有影响力的github项目优先,有在SIGMOD,VLDB,ICDE等顶级会议或者期刊论文发表者优选。
工作职责
1、负责短视频多模态检索平台的研发工作; 2、针对大规模、实时、高并发等业务特点,持续对检索系统进行性能优化; 3、优化向量索引算法,提升检索效率和检索准确率; 4、利用检索平台的框架,解决业务中各种大规模向量计算的问题; 5、结合结构化检索和非结构化检索,处理复杂场景的检索问题。
包括英文材料
学历+
C+
https://www.freecodecamp.org/chinese/news/the-c-beginners-handbook/
本手册遵循二八定律。你将在 20% 的时间内学习 80% 的 C 编程语言。
https://www.youtube.com/watch?v=87SH2Cn0s9A
https://www.youtube.com/watch?v=KJgsSFOSQv0
This course will give you a full introduction into all of the core concepts in the C programming language.
https://www.youtube.com/watch?v=PaPN51Mm5qQ
In this complete C programming course, Dr. Charles Severance (aka Dr. Chuck) will help you understand computer architecture and low-level programming with the help of the classic C Programming language book written by Brian Kernighan and Dennis Ritchie.
C+++
https://www.learncpp.com/
LearnCpp.com is a free website devoted to teaching you how to program in modern C++.
https://www.youtube.com/watch?v=ZzaPdXTrSb8
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
Linux+
https://ryanstutorials.net/linuxtutorial/
Ok, so you want to learn how to use the Bash command line interface (terminal) on Unix/Linux.
https://ubuntu.com/tutorials/command-line-for-beginners
The Linux command line is a text interface to your computer.
https://www.youtube.com/watch?v=6WatcfENsOU
In this Linux crash course, you will learn the fundamental skills and tools you need to become a proficient Linux system administrator.
https://www.youtube.com/watch?v=v392lEyM29A
Never fear the command line again, make it fear you.
https://www.youtube.com/watch?v=ZtqBQ68cfJc
内核+
https://www.youtube.com/watch?v=C43VxGZ_ugU
I rummage around the Linux kernel source and try to understand what makes computers do what they do.
https://www.youtube.com/watch?v=HNIg3TXfdX8&list=PLrGN1Qi7t67V-9uXzj4VSQCffntfvn42v
Learn how to develop your very own kernel from scratch in this programming series!
https://www.youtube.com/watch?v=JDfo2Lc7iLU
Denshi goes over a simple explanation of what computer kernels are and how they work, alonside what makes the Linux kernel any special.
数据结构+
https://www.youtube.com/watch?v=8hly31xKli0
In this course you will learn about algorithms and data structures, two of the fundamental topics in computer science.
https://www.youtube.com/watch?v=B31LgI4Y4DQ
Learn about data structures in this comprehensive course. We will be implementing these data structures in C or C++.
https://www.youtube.com/watch?v=CBYHwZcbD-s
Data Structures and Algorithms full course tutorial java
算法+
https://roadmap.sh/datastructures-and-algorithms
Step by step guide to learn Data Structures and Algorithms in 2025
https://www.hellointerview.com/learn/code
A visual guide to the most important patterns and approaches for the coding interview.
https://www.w3schools.com/dsa/
机器学习+
https://www.youtube.com/watch?v=0oyDqO8PjIg
Learn about machine learning and AI with this comprehensive 11-hour course from @LunarTech_ai.
https://www.youtube.com/watch?v=i_LwzRVP7bg
Learn Machine Learning in a way that is accessible to absolute beginners.
https://www.youtube.com/watch?v=NWONeJKn6kc
Learn the theory and practical application of machine learning concepts in this comprehensive course for beginners.
https://www.youtube.com/watch?v=PcbuKRNtCUc
Learn about all the most important concepts and terms related to machine learning and AI.
深度学习+
https://d2l.ai/
Interactive deep learning book with code, math, and discussions.
分布式系统+
https://www.distributedsystemscourse.com/
The home page of a free online class in distributed systems.
https://www.youtube.com/watch?v=7VbL89mKK3M&list=PLOE1GTZ5ouRPbpTnrZ3Wqjamfwn_Q5Y9A
Hadoop+
https://www.runoob.com/w3cnote/hadoop-tutorial.html
Hadoop 为庞大的计算机集群提供可靠的、可伸缩的应用层计算和存储支持,它允许使用简单的编程模型跨计算机群集分布式处理大型数据集,并且支持在单台计算机到几千台计算机之间进行扩展。
[英文] Hadoop Tutorial
https://www.tutorialspoint.com/hadoop/index.htm
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.
Flink+
https://nightlies.apache.org/flink/flink-docs-release-2.0/docs/learn-flink/overview/
This training presents an introduction to Apache Flink that includes just enough to get you started writing scalable streaming ETL, analytics, and event-driven applications, while leaving out a lot of (ultimately important) details.
https://www.youtube.com/watch?v=WajYe9iA2Uk&list=PLa7VYi0yPIH2GTo3vRtX8w9tgNTTyYSux
Today’s businesses are increasingly software-defined, and their business processes are being automated. Whether it’s orders and shipments, or downloads and clicks, business events can always be streamed. Flink can be used to manipulate, process, and react to these streaming events as they occur.
Spark+
[英文] Learning Spark Book
https://pages.databricks.com/rs/094-YMS-629/images/LearningSpark2.0.pdf
This new edition has been updated to reflect Apache Spark’s evolution through Spark 2.x and Spark 3.0, including its expanded ecosystem of built-in and external data sources, machine learning, and streaming technologies with which Spark is tightly integrated.
Hive+
[英文] Hive Tutorial
https://www.tutorialspoint.com/hive/index.htm
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
https://www.youtube.com/watch?v=D4HqQ8-Ja9Y
HBase+
[英文] HBase Tutorial
https://www.tutorialspoint.com/hbase/index.htm
HBase is a data model that is similar to Google's big table designed to provide quick random access to huge amounts of structured data. This tutorial provides an introduction to HBase, the procedures to set up HBase on Hadoop File Systems, and ways to interact with HBase shell.
Faiss+
https://faiss.ai/index.html
Faiss is a library for efficient similarity search and clustering of dense vectors.
https://huggingface.co/learn/llm-course/en/chapter5/6
In this section we’ll use this information to build a search engine that can help us find answers to our most pressing questions about the library!
GitHub+
[英文] GitHub Learn
https://learn.github.com/
Discover a wide range of beginner-friendly tutorials, hands-on learning, and expert-led lessons.
相关职位
实习J1014
1、负责快手各产品后端系统、平台系统的研发工作,通过敏捷开发支持产品需求快速迭代,不断优化系统架构,支撑业务规模增长,保障服务稳定; 2、对现有系统的不足进行分析,找到目前系统的瓶颈,改进提高系统性能; 3、参与解决海量数据分布式处理、高效查询、数据一致性、准确性等方面带来的各种技术难题和挑战。
更新于 2025-05-20
实习J1014
1、建设承载快手亿级用户业务的网络基础设施及平台,包括但不限IDC核心网络、CDN/PCDN、边缘计算、四七层网关、流量智能调度、高性能网络产品、虚拟网络、智能网络、SDN等多个方向,持续改进系统稳定性、易用性、性能,保障业务高效迭代、稳定运行; 2、参与公司边缘计算、P2P系统的设计和实现,在确保用户体验的前提下,节约带宽成本,挑战公司海量用户的业务需求; 3、负责客户端/后端的开发实现、性能优化、不同平台、不同业务场景的适配工作; 4、分析并解决线下/线上测试中发现的软件缺陷; 5、结合业务需求,跟踪传输算法学术界和业界最新进展,推动拥塞控制算法的研究和落地在实际领域的优化和落地。
更新于 2025-03-04