阿里云阿里云智能-服务器软硬件结合研发专家-杭州/深圳
社招全职5年以上云智能集团地点:深圳 | 杭州状态:招聘
任职要求
1. 电子、计算机、自动化相关专业; 2. 精通C/C++语言编程,熟悉Linux体系及实现原理(设备驱动、I/O系统、网络系统等),有kernel或用户态驱动开发经验; 3. 熟悉各种网络协议,网络基础知识,熟悉TCP/IP原理及实现,熟悉DPDK等; 4. 熟悉RDMA高性能网络协议在智能网卡上的实现与优化; 5. 具有良好的沟通能力,热爱技术,主动学习,能够快速掌握新技术; 掌握以下技能优先: 1. 熟悉掌握RDMA RoCE v2协议工作原理、问题定位方法,性能调优和编程实践; 2. 对RoCE v2驱动有实际开发或调优经验,对线上各类网络问题有丰富的问题定位经验; 3. 熟悉智能网卡软硬件结合性能优化技术,有实操经验优先; 4. 熟悉主流的RDMA智能网卡的特性和设计优先,包括不限于mellanox、broadcom等。
工作职责
1. 负责智能网卡的网卡驱动和RDMA驱动开发和实现; 2. 负责智能网卡在AI智算,存储等领域软硬件结合优化,创新研究; 3. 通过智能网卡的软硬件创新与优化,包括高性能网络协议的硬件卸载优化,帮助云产品基础设施持续提升技术竞争力。
包括英文材料
C+
https://www.freecodecamp.org/chinese/news/the-c-beginners-handbook/
本手册遵循二八定律。你将在 20% 的时间内学习 80% 的 C 编程语言。
https://www.youtube.com/watch?v=87SH2Cn0s9A
https://www.youtube.com/watch?v=KJgsSFOSQv0
This course will give you a full introduction into all of the core concepts in the C programming language.
https://www.youtube.com/watch?v=PaPN51Mm5qQ
In this complete C programming course, Dr. Charles Severance (aka Dr. Chuck) will help you understand computer architecture and low-level programming with the help of the classic C Programming language book written by Brian Kernighan and Dennis Ritchie.
C+++
https://www.learncpp.com/
LearnCpp.com is a free website devoted to teaching you how to program in modern C++.
https://www.youtube.com/watch?v=ZzaPdXTrSb8
Linux+
https://ryanstutorials.net/linuxtutorial/
Ok, so you want to learn how to use the Bash command line interface (terminal) on Unix/Linux.
https://ubuntu.com/tutorials/command-line-for-beginners
The Linux command line is a text interface to your computer.
https://www.youtube.com/watch?v=6WatcfENsOU
In this Linux crash course, you will learn the fundamental skills and tools you need to become a proficient Linux system administrator.
https://www.youtube.com/watch?v=v392lEyM29A
Never fear the command line again, make it fear you.
https://www.youtube.com/watch?v=ZtqBQ68cfJc
内核+
https://www.youtube.com/watch?v=C43VxGZ_ugU
I rummage around the Linux kernel source and try to understand what makes computers do what they do.
https://www.youtube.com/watch?v=HNIg3TXfdX8&list=PLrGN1Qi7t67V-9uXzj4VSQCffntfvn42v
Learn how to develop your very own kernel from scratch in this programming series!
https://www.youtube.com/watch?v=JDfo2Lc7iLU
Denshi goes over a simple explanation of what computer kernels are and how they work, alonside what makes the Linux kernel any special.
TCP/IP+
[英文] What is TCP/IP?
https://www.techtarget.com/searchnetworking/definition/TCP-IP
TCP/IP stands for Transmission Control Protocol/Internet Protocol and is a suite of communication protocols used to interconnect network devices on the internet.
性能调优+
https://goperf.dev/
The Go App Optimization Guide is a series of in-depth, technical articles for developers who want to get more performance out of their Go code without relying on guesswork or cargo cult patterns.
https://web.dev/learn/performance
This course is designed for those new to web performance, a vital aspect of the user experience.
https://www.ibm.com/think/insights/application-performance-optimization
Application performance is not just a simple concern for most organizations; it’s a critical factor in their business’s success.
https://www.oreilly.com/library/view/optimizing-java/9781492039259/
Performance tuning is an experimental science, but that doesn’t mean engineers should resort to guesswork and folklore to get the job done.
相关职位
社招5年以上技术类-开发
1. 依据公共云统一架构、OpenAPI规范、软件技术栈以及交付运维体系,负责专属云产品的研发与交付。深度参与灵骏裸金属服务器及EGS云计算服务器的研发流程,包括硬件架构预研、方案设计、软硬件结合的系统优化、线上服务质量保证以及提供专家技术支持等关键环节,确保产品从研发到运维的全生命周期高效管理。 2. 跟踪并把握GPU架构设计的发展趋势,探索前沿的GPU架构设计技术。联合高性能网络团队共同设计网络互联架构,针对分布式训练和推理业务场景,在软硬件协同及高性能网络方向上寻找性能优化的新途径,构建阿里云加速计算云服务器的核心竞争力。 3. 研发并持续改进系统的稳定性和安全性,确保平台的安全可靠运行,并不断提升对外服务质量标准。
更新于 2025-06-18
社招5年以上云智能集团
1. 负责FPGA/芯片产品的架构和系统方案设计,定义软硬件接口和FPGA逻辑架构, 完成逻辑设计和开发、测试、上线、运维等全生命周期的研发工作; 2. 负责相关FPGA/芯片的性能优化和稳定性保障,持续提升网卡互连的性能和稳定性,确保系统安全、稳定、高效运行; 3. 参与网卡互连等新技术预研和规划,跟踪业务需求和行业技术变化,进行产品规划和FPGA架构演进;包括下一代虚拟网络,软硬结合技术,高性能传输协议,AI Scale UP和Scale Out网络等。
更新于 2025-09-03
社招后端开发
1、构建及维护操作系统基础环境,负责线上服务器操作系统底层基础模块的稳定运行。 2、优化操作系统、内核、服务器等运行环境,提升小红书整体业务性能。 3、结合软硬件及k8s调度技术,提供体系的解决方案,给上层应用带来稳定性的提升及成本的下降,包括但不限定于混部,超卖等技术。 4、负责构建实时、稳定的全链路跟踪系统,聚焦linux、服务器、交换机等基础设施的故障定界。
更新于 2025-09-13