logo of nvidia

英伟达Deep Learning Performance Architect - Perf Tools

社招全职地点:上海状态:招聘

任职要求


• BS+ in Computer Science, Electronic Engineering or related (or equivalent experience)
• 4+ years of software development 
• Strong software skill in design, coding (C++ and Python), analytical and debugging in low-level program 
• Strong grasp of computer architecture (pipelines, memory hierarchies) and operating system fundamentals 
• Experience with performance modeling, architecture simulation, profiling, and analysis. 
• Self-starter who thrives in dynamic environments and manages competing priorities effectively. 
  
Ways to stand out from the crowd:
 • Experience with building performance debugging and analysis tools on silicon and simulators. Experience of developing application snapshot and replay tool is a big plus.
• Familiar with CUDA System Software Stack(e.g., CUDA Driver/Runtime APIs), CUDA kernel optimization and understand GPU architecture 
• Familiarity with GPU performance profiling tools like Nsight System, Nsight Compute, NVTX, etc, or experience for developing similar tools for other processors. 
• Practical experience or projects demonstrating AI/ML-based code generation, automated data analysis, or workflow assistants.

工作职责


• Architect Performance Tooling: Develop infrastructure tools/libraries for GPU performance analysis, visualization, and automated workflows used across GPU SW/HW development life cycle.  
• Unlock Architectural Insights: Analyze GPU workloads to identify bottlenecks and define new hardware profiling features that enhance perf debug and profiling capabilities. 
• AI-Powered Automation: Build AI/ML-driven tools to automate performance analysis, generate perf optimization guidance, and improve user experience of profiling infrastructure. 
• Cross-Stack Collaboration: Partner with kernel developers, system software teams, and hardware architects to support performance study, improve CUDA software stack, and co-design performance-centric solutions for current and next-generation GPU architecture
包括英文材料
C+
Python+
CUDA+
内核+
Nsight+
相关职位

logo of nvidia
社招

A key part of NVIDIA's strength is our sophisticated analysis / debugging tools that empower NVIDIA engineers to improve perf and power efficiency of our products and the running applications. We are looking for forward-thinking, hard-working, and creative people to join a multifaceted software team with high standards! This software engineering role involves developing tools for AI researchers and SW/HW teams running AI workload in GPU cluster.As a member of the software development team, we will work with users from different departments like Architecture teams, Software teams. Our work brings the users intuitive, rich and accurate insight in the workload and the system, and empower them to find opportunities in software and hardware, build high level models to propose and deliver the best hardware and software to our customers, or debugging tricky failures and issues to help improve the performance and efficiency of the system. What you’ll be doing: • Build internal profiling and analysis tools for AI workloads at large scale • Build debugging tools for common encountered problems like memory or networking • Create benchmarking and simulation technologies for AI system or GPU cluster • Partner with HW architects to propose new features or improve existing features with real world use cases

更新于 2025-06-19
logo of nvidia
社招

• Design, develop, and optimize major layers in LLM (e.g attention, GEMM, inter-GPU communication) for NVIDIA's new architectures. • Implement and fine-tune kernels to achieve optimal performance on NVIDIA GPUs. • Conduct in-depth performance analysis of GPU kernels, including Attention and other critical operations. • Identify bottlenecks, optimize resource utilization, and improve throughput, and power efficiency • Create and maintain workloads and micro-benchmark suites to evaluate kernel performance across various hardware and software configurations. • Generate performance projections, comparisons, and detailed analysis reports for internal and external stakeholders. • Collaborate with architecture, software, and product teams to guide the development of next-generation deep learning hardware and software.

更新于 2025-09-03
logo of nvidia
社招

We are now looking for a Deep Learning Performance Software Engineer! We are expanding our research and development for Inference. We seek excellent Software Engineers and Senior Software Engineers to join our team.We specialize in developing GPU-accelerated Deep learning software. Researchers around the world are using NVIDIA GPUs to power a revolution in deep learning, enabling breakthroughs in numerous areas. Join the team that builds software to enable new solutions. Collaborate with the deep learning community to implement the latest algorithms for public release in Tensor-RT. Your ability to work in a fast-paced customer-oriented team is required and excellent communication skills are necessary.  What you’ll be doing: • Develop highly optimized deep learning kernels for inference • Do performance optimization, analysis, and tuning • Work with cross-collaborative teams across automotive, image understanding, and speech understanding to develop innovative solutions • Occasionally travel to conferences and customers for technical consultation and training

更新于 2025-09-23
logo of nvidia
社招

NVIDIA is developing processors and system architectures that accelerate deep learning on edge devices, workstations and data center GPUs for a variety of applications, including automotive, robotics,  large language models (LLMs) and AI generative models. We are looking for an expert deep learning system performance architect to join our modelling, efficiency optimization, performance projections and analysis effort. In this position, you will have the chance to optimize deep learning hardware and software architecture and make the significant impact in a dynamic technology focused company What you’ll be doing :• Analyze performance and efficiency of various machine learning/deep learning algorithms on different architectures • Identify architecture and software performance bottlenecks and propose optimizations • Explore new features and hardware capabilities on deep learning applications

更新于 2025-09-03