英伟达Software Engineer, AI and DL Kernel Libraries

社招全职2026-06-24地点：上海状态：招聘

扫码手机上打开

任职要求

• Master's degree in Computer Science, Electrical Engineering, or a related field, or equivalent experience.
• 3+ years of relevant industry, research, or systems software development experience in machine learning, deep learning systems, compilers, or GPU software. More experience is expected for senior-level candidates.
• Strong programming skills in C/C++ and Python, with hands-on experience developing high-performance software.
• Solid experience with CUDA development and GPU programming fundamentals.
• Strong experience developing or using deep learning frameworks such as PyTorch, JAX, TensorFlow, or ONNX.
• Good understanding of linear algebra, performance analysis, profiling, and code optimization.
• Experience designing software abstractions, APIs, or higher-level system architecture for performance-sensitive systems.
• Familiarity with modern machine learning…

登录查看完整任职要求

微信扫码，1秒登录

工作职责

• Develop production-quality software that ships as part of NVIDIA's AI software stack, including cuDNN, FlashInfer, and optimized support for large language model inference workloads.
• Innovate and develop new AI systems technologies for efficient inference, with a focus on performance, scalability, maintainability, and usability.
• Design, implement, and optimize kernels for high-impact AI workloads across LLM inference, generative AI, computer vision, autonomous driving, and recommender systems.
• Design and implement extensible software abstractions for deep learning libraries, LLM serving engines, and runtime systems.
• Build and improve just-in-time compilation, code generation, and runtime technologies for performance-critical GPU workloads.
• Analyze workload performance, tune current software, and propose improvements to future software and hardware-software interfaces.
• Collaborate closely with engineers across deep learning frameworks, libraries, kernels, compilers, and GPU architecture teams at NVIDIA.
• Contribute to open-source communities and ecosystem integrations where relevant, including projects such as FlashInfer, vLLM, and SGLang.

📮 投递简历 ✨AI模拟面试

难度：

包括英文材料

C+

Python+

CUDA+

PyTorch+

JAX+

TensorFlow+

ONNX+

还有更多 •••

登录查看完整学习资料