美团高性能分布式存储研发工程师
社招全职核心本地商业-基础研发平台地点:北京状态:招聘
任职要求
1、精通C/C++,具有多线程、高并发及各种IO模型的编程经验,对性能优化有强烈的自我驱动力; 2、熟悉分布式算法(如paxos、raft)、计算机网络(如各种rpc实现和高性能网络收发库),理解大型分布式存储系统工作原理,以及相应的开源实现(例如grpc、zk、etcd、braft等等); 3、有文件存储、块存储、对象存储开发经验(包括但不限于Ceph、GlusterFS、Lustre、ZFS、fuse等等)。 具备以下条件优先 1、熟悉主流AI框架,如TensorFlow、PyTorch等。 2、对AI生态有深入理解,能有效对接并优化数据访问流程。 3、对高性能计算场景的存储需求有深刻体会,了解现有开源存储项目在高性能场景下的诸多痛点和不足 4、了解行业发展前沿技术如高速RDMA网络、用户态零拷贝存储SPDK、高性能存储介质NVMe、高并发无锁数据结构等等
工作职责
1、负责主导面向AI场景的高性能分布式存储平台的设计和研发工作; 2、面向公司内特有场景, 基于开源模块,设计与落地实现更合理的自研存储架构,接口层面包括但不限于分布式文件/分布式对象存储等; 3、研究AI生态对接、数据访问/处理加速,与存储配合设计解决性能瓶颈,构建AI场景端到端竞争力;
包括英文材料
C+
https://www.freecodecamp.org/chinese/news/the-c-beginners-handbook/
本手册遵循二八定律。你将在 20% 的时间内学习 80% 的 C 编程语言。
https://www.youtube.com/watch?v=87SH2Cn0s9A
https://www.youtube.com/watch?v=KJgsSFOSQv0
This course will give you a full introduction into all of the core concepts in the C programming language.
https://www.youtube.com/watch?v=PaPN51Mm5qQ
In this complete C programming course, Dr. Charles Severance (aka Dr. Chuck) will help you understand computer architecture and low-level programming with the help of the classic C Programming language book written by Brian Kernighan and Dennis Ritchie.
C+++
https://www.learncpp.com/
LearnCpp.com is a free website devoted to teaching you how to program in modern C++.
https://www.youtube.com/watch?v=ZzaPdXTrSb8
多线程+
https://liaoxuefeng.com/books/java/threading/basic/index.html
和单线程相比,多线程编程的特点在于:多线程经常需要读写共享数据,并且需要同步。
https://www.youtube.com/watch?v=_uQgGS_VIXM&list=PLsc-VaxfZl4do3Etp_xQ0aQBoC-x5BIgJ
https://www.youtube.com/watch?v=IEEhzQoKtQU
https://www.youtube.com/watch?v=mTGdtC9f4EU&list=PLL8woMHwr36EDxjUoCzboZjedsnhLP1j4
https://www.youtube.com/watch?v=TPVH_coGAQs&list=PLk6CEY9XxSIAeK-EAh3hB4fgNvYkYmghp
https://www.youtube.com/watch?v=xPqnoB2hjjA
This video is an introduction to multithreading in modern C++.
https://www.youtube.com/watch?v=YKBwKy5PrpQ
Rust threading is easy to implement and improves the efficiency of your applications on multi-core systems!
高并发+
https://www.baeldung.com/concurrency-principles-patterns
In this tutorial, we’ll discuss some of the design principles and patterns that have been established over time to build highly concurrent applications.
https://www.baeldung.com/java-concurrency
Handling concurrency in an application can be a tricky process with many potential pitfalls. A solid grasp of the fundamentals will go a long way to help minimize these issues.
https://www.oreilly.com/library/view/concurrency-in-go/9781491941294/
You’ll understand how Go chooses to model concurrency, what issues arise from this model, and how you can compose primitives within this model to solve problems.
https://www.oreilly.com/library/view/modern-concurrency-in/9781098165406/
With this book, you'll explore the transformative world of Java 21's key feature: virtual threads.
https://www.youtube.com/watch?v=qyM8Pi1KiiM
https://www.youtube.com/watch?v=wEsPL50Uiyo
算法+
https://roadmap.sh/datastructures-and-algorithms
Step by step guide to learn Data Structures and Algorithms in 2025
https://www.hellointerview.com/learn/code
A visual guide to the most important patterns and approaches for the coding interview.
https://www.w3schools.com/dsa/
Paxos+
https://lamport.azurewebsites.net/pubs/paxos-simple.pdf
The Paxos algorithm for implementing a fault-tolerant distributed system has been regarded as difficult to understand, perhaps because the original presentation was Greek to many readers.
https://paxos.systems/
Paxos algorithms are a family of consensus algorithms (or protocols) that are used in distributed systems to achieve consensus in the presence of crash failures.
https://www.scylladb.com/glossary/paxos-consensus-algorithm/
Paxos is a family of protocols for solving the problem of consensus in distributed networks.
Raft+
https://raft.github.io/
Raft is a consensus algorithm that is designed to be easy to understand.
https://thesecretlivesofdata.com/raft/
Understandable Distributed Consensus
https://www.youtube.com/watch?v=ZyqAbQkpeUo
Sail into the world of distributed systems with our in-depth, Raft consensus algorithm tutorial.
RPC+
https://javaguide.cn/distributed-system/rpc/rpc-intro.html
为什么要 RPC ? 因为,两个不同的服务器上的服务提供的方法不在一个内存空间,所以,需要通过网络编程才能传递方法调用所需要的参数。并且,方法调用的结果也需要通过网络编程来接收。
https://www.youtube.com/watch?v=S2osKiqQG9s
This video is part of an 8-lecture series on distributed systems, given as part of the undergraduate computer science course at the University of Cambridge.
gRPC+
[英文] Introduction to gRPC
https://grpc.io/docs/what-is-grpc/introduction/
An introduction to gRPC and protocol buffers.
etcd+
[英文] A Guide to etcd
https://www.baeldung.com/java-etcd-guide
In this comprehensive tutorial, we’ll delve into etcd, an open-source distributed key-value store.
Ceph+
https://docs.ceph.com/en/squid/start/beginners-guide/
The purpose of A Beginner’s Guide to Ceph is to make Ceph comprehensible.
https://www.youtube.com/watch?v=oEKJnHAfSiw
TensorFlow+
https://www.youtube.com/watch?v=tpCFfeUEGs8
Ready to learn the fundamentals of TensorFlow and deep learning with Python? Well, you’ve come to the right place.
https://www.youtube.com/watch?v=ZUKz4125WNI
This part continues right where part one left off so get that Google Colab window open and get ready to write plenty more TensorFlow code.
PyTorch+
https://datawhalechina.github.io/thorough-pytorch/
PyTorch是利用深度学习进行数据科学研究的重要工具,在灵活性、可读性和性能上都具备相当的优势,近年来已成为学术界实现深度学习算法最常用的框架。
https://www.youtube.com/watch?v=V_xro1bcAuA
Learn PyTorch for deep learning in this comprehensive course for beginners. PyTorch is a machine learning framework written in Python.
数据结构+
https://www.youtube.com/watch?v=8hly31xKli0
In this course you will learn about algorithms and data structures, two of the fundamental topics in computer science.
https://www.youtube.com/watch?v=B31LgI4Y4DQ
Learn about data structures in this comprehensive course. We will be implementing these data structures in C or C++.
https://www.youtube.com/watch?v=CBYHwZcbD-s
Data Structures and Algorithms full course tutorial java
相关职位
社招5-7年软件研发
1.打造以及维护毫秒级高性能分布式存储系统,针对性能,成本,可靠性持续优化; 2.负责面向AI训练的高性能存储应用设计和开发; 3.负责分布式存储系统的线上运营,数据分析,以及现网问题攻坚。
更新于 2024-11-08
社招5年以上D7194
1、面向离线计算、实时计算、数据湖、AI训练、消息系统等负载,提供高稳定性、低成本、高性能的分布式文件系统服务; 2、解决超大规模分布式文件系统在元数据管理、集群管理、低成本、稳定性、性能和可运维性上面临的各类挑战性问题。
更新于 2025-02-12
社招5年以上D7194
1、面向离线计算、实时计算、数据湖、AI训练、消息系统等负载,提供高稳定性、低成本、高性能的分布式文件系统服务; 2、解决超大规模分布式文件系统在元数据管理、集群管理、低成本、稳定性、性能和可运维性上面临的各类挑战性问题。
更新于 2025-02-12