阿里巴巴达摩院-工具链专家-计算技术
社招全职3年以上技术-芯片地点:杭州 | 上海状态:招聘
任职要求
1. 计算机、软件工程或电子工程相关专业本科及以上学历,5 年以上系统软件或工具链开发经验,其中至少 2 年专注于 GPU/加速器工具链或高性能计算基础设施方向。 2. 要求精通 C/C++(底层采集、GDB 扩展、CoreDump 解析)和 Python(上层 CLI、SDK、数据分析),深入理解 Linux 内核调试子系统(perf、ftrace、eBPF、kgdb)、PCIe 错误处理和设备驱动模型。 3. 熟悉 Kubernetes 扩展机制(Device…
登录查看完整任职要求
微信扫码,1秒登录
工作职责
作为工具链专家,你将负责 AI 芯片开发者工具体系的核心架构设计与关键模块交付,覆盖性能调优(Insight)、调试诊断(GDB/CoreDump 增强)、系统管理(DCGM/SMI)和云原生运维(K8S GPU Operator)四大工具线。你的核心使命是在 Tapeout 窗口期内,推动工具链从单机可用走向集群化、可编程化、产品化,对标 NVIDIA Nsight/DCGM/NVML 生态,构建 Profiling → Debugging → Root Cause Analysis 的一体化工作流。你还需要在架构层面为 AI Agent 集成预留标准化接口,让工具链的能力能被上层自动化系统和智能 Agent 稳定调用,最终将内部工具转化为可对外交付的开发者产品。
包括英文材料
学历+
C+
https://www.freecodecamp.org/chinese/news/the-c-beginners-handbook/
本手册遵循二八定律。你将在 20% 的时间内学习 80% 的 C 编程语言。
https://www.youtube.com/watch?v=87SH2Cn0s9A
https://www.youtube.com/watch?v=KJgsSFOSQv0
This course will give you a full introduction into all of the core concepts in the C programming language.
https://www.youtube.com/watch?v=PaPN51Mm5qQ
In this complete C programming course, Dr. Charles Severance (aka Dr. Chuck) will help you understand computer architecture and low-level programming with the help of the classic C Programming language book written by Brian Kernighan and Dennis Ritchie.
C+++
https://www.learncpp.com/
LearnCpp.com is a free website devoted to teaching you how to program in modern C++.
https://www.youtube.com/watch?v=ZzaPdXTrSb8
GDB+
[英文] Debugging with GDB
https://betterexplained.com/articles/debugging-with-gdb/
A debugger lets you pause a program, examine and change variables, and step through code.
https://code.visualstudio.com/docs/cpp/cpp-debug
After you have set up the basics of your debugging environment as specified in the configuration tutorials for each target compiler/platform, you can learn more details about debugging C/C++ in this section.
https://opensource.com/article/21/3/debug-code-gdb
Troubleshoot your code with the GNU Debugger.
https://www.brendangregg.com/blog/2016-08-09/gdb-example-ncurses.html
gdb is the GNU Debugger, the standard debugger on Linux.
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
CLI+
https://developer.mozilla.org/en-US/docs/Learn_web_development/Getting_started/Environment_setup/Command_line
In your development process, you'll undoubtedly be required to run some commands in the terminal (or on the "command line" — these are effectively the same thing).
https://www.youtube.com/watch?v=dfTpFFZwazI
In this video, let's look at how to create CLI scripts with JavaScript and Node.js.
https://www.youtube.com/watch?v=zPYjfgxYO7k
In this video we'll build a simple golang cli app that shows the weather forecast for the day.
SDK+
https://www.ibm.com/think/topics/api-vs-sdk
Learn about software development kits (SDKs) and application programming interfaces (APIs) and how they improve both software development cycles and the end-user experience (UX).
https://www.redhat.com/zh-cn/topics/cloud-native-apps/what-is-SDK
软件开发套件(SDK)是通常由硬件平台、操作系统(OS)或编程语言的制造商提供的一套工具。
还有更多 •••