英伟达Software Engineer, AI and DL Kernel Libraries
社招全职地点:上海状态:招聘
任职要求
• Master's degree in Computer Science, Electrical Engineering, or a related field, or equivalent experience. • 3+ years of relevant industry, research, or systems software development experience in machine learning, deep learning systems, compilers, or GPU software. More experience is expected for senior-level candidates. • Strong programming skills in C/C++ and Python, with hands-on experience developing high-performance software. • Solid experience with CUDA development and GPU programming fundamentals. • Strong experience developing or using deep learning frameworks such as PyTorch, JAX, TensorFlow, or ONNX. • Good understanding of linear algebra, performance analysis, profiling, and code optimization. • Experience designing software abstractions, APIs, or higher-level system architecture for performance-sensitive systems. • Familiarity with modern machine learning…
登录查看完整任职要求
微信扫码,1秒登录
工作职责
• Develop production-quality software that ships as part of NVIDIA's AI software stack, including cuDNN, FlashInfer, and optimized support for large language model inference workloads. • Innovate and develop new AI systems technologies for efficient inference, with a focus on performance, scalability, maintainability, and usability. • Design, implement, and optimize kernels for high-impact AI workloads across LLM inference, generative AI, computer vision, autonomous driving, and recommender systems. • Design and implement extensible software abstractions for deep learning libraries, LLM serving engines, and runtime systems. • Build and improve just-in-time compilation, code generation, and runtime technologies for performance-critical GPU workloads. • Analyze workload performance, tune current software, and propose improvements to future software and hardware-software interfaces. • Collaborate closely with engineers across deep learning frameworks, libraries, kernels, compilers, and GPU architecture teams at NVIDIA. • Contribute to open-source communities and ecosystem integrations where relevant, including projects such as FlashInfer, vLLM, and SGLang.
包括英文材料
C+
https://www.freecodecamp.org/chinese/news/the-c-beginners-handbook/
本手册遵循二八定律。你将在 20% 的时间内学习 80% 的 C 编程语言。
https://www.youtube.com/watch?v=87SH2Cn0s9A
https://www.youtube.com/watch?v=KJgsSFOSQv0
This course will give you a full introduction into all of the core concepts in the C programming language.
https://www.youtube.com/watch?v=PaPN51Mm5qQ
In this complete C programming course, Dr. Charles Severance (aka Dr. Chuck) will help you understand computer architecture and low-level programming with the help of the classic C Programming language book written by Brian Kernighan and Dennis Ritchie.
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
CUDA+
https://developer.nvidia.com/blog/even-easier-introduction-cuda/
This post is a super simple introduction to CUDA, the popular parallel computing platform and programming model from NVIDIA.
https://www.youtube.com/watch?v=86FAWCzIe_4
Lean how to program with Nvidia CUDA and leverage GPUs for high-performance computing and deep learning.
PyTorch+
https://datawhalechina.github.io/thorough-pytorch/
PyTorch是利用深度学习进行数据科学研究的重要工具,在灵活性、可读性和性能上都具备相当的优势,近年来已成为学术界实现深度学习算法最常用的框架。
https://www.youtube.com/watch?v=V_xro1bcAuA
Learn PyTorch for deep learning in this comprehensive course for beginners. PyTorch is a machine learning framework written in Python.
JAX+
https://docs.jax.dev/en/latest/notebooks/thinking_in_jax.html
JAX is a library for array-oriented numerical computation, with automatic differentiation and JIT compilation to enable high-performance machine learning research.
TensorFlow+
https://www.youtube.com/watch?v=tpCFfeUEGs8
Ready to learn the fundamentals of TensorFlow and deep learning with Python? Well, you’ve come to the right place.
https://www.youtube.com/watch?v=ZUKz4125WNI
This part continues right where part one left off so get that Google Colab window open and get ready to write plenty more TensorFlow code.
ONNX+
https://github.com/onnx/tutorials
Open Neural Network Exchange (ONNX) is an open standard format for representing machine learning models.
[英文] Introduction to ONNX
https://onnx.ai/onnx/intro/
This documentation describes the ONNX concepts (Open Neural Network Exchange).
还有更多 •••