英伟达Senior Deep Learning Compiler Engineer - CUDA

社招全职2025-11-11地点：上海 | 北京状态：招聘

扫码手机上打开

任职要求

• Masters or PhD or equivalent experience in relevant discipline (CE, CS&E, CS, AI) 
• 4 + years of relevant work experience
• Excellent C/C++ programming and software engineering skills, ACM background is a plus
• Good fundamental knowledges on computer architecture
• Strong ability in abstracting problems and the methodology in resolving problems
• Strong compiler backgrounds including MLIR/TVM/Triton/LLVM is desired
• Good knowledge of GPU architecture a…

登录查看完整任职要求

微信扫码，1秒登录

工作职责

• Design and implement the DSL and the core compiler of tile-aware GPU programming model for emerging GPU architectures
• Continuously innovate and iterate on the core architecture of the compiler to consistently optimize performance
• Investigation of next-generation GPU architectures and provide solutions in the DSL and compiler stack
• Performance analysis on emerging AI/LLM workloads and integrate with AI/ML frameworks

📮 投递简历 ✨AI模拟面试

难度：

包括英文材料

C+

Triton Inference Server+

LLVM+

还有更多 •••

登录查看完整学习资料

相关职位

Deep Learning Performance Software Engineer

社招

We are now looking for a Deep Learning Performance Software Engineer! We are expanding our research and development for deep learning. We seek excellent Software Engineers and Senior Software Engineers to join our team. We specialize in developing GPU-accelerated Deep learning software. Researchers around the world are using NVIDIA GPUs to power a revolution in deep learning, enabling breakthroughs in numerous areas. Join the team that builds software to enable new solutions. Your ability to work in a fast-paced customer-oriented team is required and excellent communication skills are necessary. What you’ll be doing: • Develop deep learning compiler • Develop highly optimized deep learning kernels • End-to-end performance optimization • Do performance optimization, analysis, and tuning

更新于 2025-09-24上海|北京

Senior Performance Software Engineer, Deep Learning Libraries

社招

• Writing highly tuned compute kernels to perform core deep learning operations (e.g. matrix multiplies, convolutions, normalizations) • Following general software engineering best practices including support for regression testing and CI/CD flows • Collaborating with teams across NVIDIA:• CUDA compiler team on generating optimal assembly code • Deep learning training and inference performance teams on which layers require optimization • Hardware and architecture teams on the programming model for new deep learning hardware features

更新于 2025-09-24上海|北京

Senior System Software Engineer - AI Performance and Efficiency Tools

社招

A key part of NVIDIA's strength is our sophisticated analysis / debugging tools that empower NVIDIA engineers to improve perf and power efficiency of our products and the running applications. We are looking for forward-thinking, hard-working, and creative people to join a multifaceted software team with high standards! This software engineering role involves developing tools for AI researchers and SW/HW teams running AI workload in GPU cluster.As a member of the software development team, we will work with users from different departments like Architecture teams, Software teams. Our work brings the users intuitive, rich and accurate insight in the workload and the system, and empower them to find opportunities in software and hardware, build high level models to propose and deliver the best hardware and software to our customers, or debugging tricky failures and issues to help improve the performance and efficiency of the system. What you’ll be doing: • Build internal profiling and analysis tools for AI workloads at large scale • Build debugging tools for common encountered problems like memory or networking • Create benchmarking and simulation technologies for AI system or GPU cluster • Partner with HW architects to propose new features or improve existing features with real world use cases

更新于 2025-06-19上海

Senior Manager, Deep Learning Performance Architecture

社招

In this role, you will be managing a team of experienced deep learning performance architects to analyze deep learning networks and push the evolution of our deep learning computing system in hardware/software co-design approach. You will establish team objectives to meet schedules and goals, establish and evolve policies and procedures that affect the immediate organization, communicate with senior management for team vision and development. You’ll collaborate with members of the deep learning software framework teams and the hardware architecture teams to accelerate the next generation of deep learning computing system. The scope of your team's efforts includes deep learning workloads characterization, performance tuning and analysis, optimizing the present generation of our software tech stack and drive the evolution of the next generation of deep learning hardware and software architecture, and other general engineering management work.

更新于 2025-12-11上海|北京