logo of nvidia

英伟达Software Engineer, AI and DL Kernel Libraries

社招全职地点:上海状态:招聘

任职要求


• Master's degree in Computer Science, Electrical Engineering, or a related field, or equivalent experience.
• 3+ years of relevant industry, research, or systems software development experience in machine learning, deep learning systems, compilers, or GPU software. More experience is expected for senior-level candidates.
• Strong programming skills in C/C++ and Python, with hands-on experience developing high-performance software.
• Solid experience with CUDA development and GPU programming fundamentals.
• Strong experience developing or using deep learning frameworks such as PyTorch, JAX, TensorFlow, or ONNX.
• Good understanding of linear algebra, performance analysis, profiling, and code optimization.
• Experience designing software abstractions, APIs, or higher-level system architecture for performance-sensitive systems.
• Familiarity with modern machine learning…
登录查看完整任职要求
微信扫码,1秒登录

工作职责


• Develop production-quality software that ships as part of NVIDIA's AI software stack, including cuDNN, FlashInfer, and optimized support for large language model inference workloads.
• Innovate and develop new AI systems technologies for efficient inference, with a focus on performance, scalability, maintainability, and usability.
• Design, implement, and optimize kernels for high-impact AI workloads across LLM inference, generative AI, computer vision, autonomous driving, and recommender systems.
• Design and implement extensible software abstractions for deep learning libraries, LLM serving engines, and runtime systems.
• Build and improve just-in-time compilation, code generation, and runtime technologies for performance-critical GPU workloads.
• Analyze workload performance, tune current software, and propose improvements to future software and hardware-software interfaces.
• Collaborate closely with engineers across deep learning frameworks, libraries, kernels, compilers, and GPU architecture teams at NVIDIA.
• Contribute to open-source communities and ecosystem integrations where relevant, including projects such as FlashInfer, vLLM, and SGLang.
包括英文材料
C+
Python+
CUDA+
PyTorch+
JAX+
TensorFlow+
ONNX+
还有更多 •••