英伟达Deep Learning Performance Architect - Perf Tools

社招全职2025-12-24地点：上海 | 北京状态：招聘

扫码手机上打开

任职要求

• BS+ in Computer Science, Electronic Engineering or related (or equivalent experience)
• 4+ years of software development 
• Strong software skill in design, coding (C++ and Python), analytical and debugging in low-level program 
• Strong grasp of computer architecture (pipelines, memory hierarchies) and operating system fundamentals 
• Experience with performance modeling, architecture simulation, profiling, and analysis. 
• Self-starter who thrives in dynamic environments and manages competing priorities effectively. 
  
Ways to stand out from the crowd:
 • Exper…

登录查看完整任职要求

微信扫码，1秒登录

工作职责

• Architect Performance Tooling: Develop infrastructure tools/libraries for GPU performance analysis, visualization, and automated workflows used across GPU SW/HW development life cycle.  
• Unlock Architectural Insights: Analyze GPU workloads to identify bottlenecks and define new hardware profiling features that enhance perf debug and profiling capabilities. 
• AI-Powered Automation: Build AI/ML-driven tools to automate performance analysis, generate perf optimization guidance, and improve user experience of profiling infrastructure. 
• Cross-Stack Collaboration: Partner with kernel developers, system software teams, and hardware architects to support performance study, improve CUDA software stack, and co-design performance-centric solutions for current and next-generation GPU architecture

📮 投递简历 ✨AI模拟面试

难度：

包括英文材料

C+

Python+

还有更多 •••

登录查看完整学习资料

相关职位

Senior System Software Engineer - AI Performance and Efficiency Tools

社招

A key part of NVIDIA's strength is our sophisticated analysis / debugging tools that empower NVIDIA engineers to improve perf and power efficiency of our products and the running applications. We are looking for forward-thinking, hard-working, and creative people to join a multifaceted software team with high standards! This software engineering role involves developing tools for AI researchers and SW/HW teams running AI workload in GPU cluster.As a member of the software development team, we will work with users from different departments like Architecture teams, Software teams. Our work brings the users intuitive, rich and accurate insight in the workload and the system, and empower them to find opportunities in software and hardware, build high level models to propose and deliver the best hardware and software to our customers, or debugging tricky failures and issues to help improve the performance and efficiency of the system. What you’ll be doing: • Build internal profiling and analysis tools for AI workloads at large scale • Build debugging tools for common encountered problems like memory or networking • Create benchmarking and simulation technologies for AI system or GPU cluster • Partner with HW architects to propose new features or improve existing features with real world use cases

更新于 2025-06-19上海

AI Computing Performance Architect, Perf Analysis and Kernel Dev

社招

• Design, develop, and optimize major layers in LLM (e.g attention, GEMM, inter-GPU communication) for NVIDIA's new architectures. • Implement and fine-tune kernels to achieve optimal performance on NVIDIA GPUs. • Conduct in-depth performance analysis of GPU kernels, including Attention and other critical operations. • Identify bottlenecks, optimize resource utilization, and improve throughput, and power efficiency • Create and maintain workloads and micro-benchmark suites to evaluate kernel performance across various hardware and software configurations. • Generate performance projections, comparisons, and detailed analysis reports for internal and external stakeholders. • Collaborate with architecture, software, and product teams to guide the development of next-generation deep learning hardware and software.

更新于 2025-09-03上海

AI Computing Performance Architect Intern, Perf Analysis and Kernel Dev - 2026

实习

更新于 2026-01-20上海|北京

Deep Learning Performance Architect

社招

We are now looking for a Deep Learning Performance Software Engineer! We are expanding our research and development for Inference. We seek excellent Software Engineers and Senior Software Engineers to join our team.We specialize in developing GPU-accelerated Deep learning software. Researchers around the world are using NVIDIA GPUs to power a revolution in deep learning, enabling breakthroughs in numerous areas. Join the team that builds software to enable new solutions. Collaborate with the deep learning community to implement the latest algorithms for public release in Tensor-RT. Your ability to work in a fast-paced customer-oriented team is required and excellent communication skills are necessary. What you’ll be doing: • Develop highly optimized deep learning kernels for inference • Do performance optimization, analysis, and tuning • Work with cross-collaborative teams across automotive, image understanding, and speech understanding to develop innovative solutions • Occasionally travel to conferences and customers for technical consultation and training

更新于 2025-09-23上海|北京