英伟达AI Computing Performance Architect, Perf Analysis and Kernel Dev
任职要求
• MS or PhD in relevant discipline (CS, EE, Math) • 3+ years of industry experience in GPU programming or performance optimization for DL applications. • Demonstrated experience in analyzing and improving the performance of GPU kernels, with measurable results (e.g. performance improvements, efficiency gains). • Strong programming skills in C, C++, Perl, or Python • Strong background in computer architecture • Excel…
工作职责
• Design, develop, and optimize major layers in LLM (e.g attention, GEMM, inter-GPU communication) for NVIDIA's new architectures. • Implement and fine-tune kernels to achieve optimal performance on NVIDIA GPUs. • Conduct in-depth performance analysis of GPU kernels, including Attention and other critical operations. • Identify bottlenecks, optimize resource utilization, and improve throughput, and power efficiency • Create and maintain workloads and micro-benchmark suites to evaluate kernel performance across various hardware and software configurations. • Generate performance projections, comparisons, and detailed analysis reports for internal and external stakeholders. • Collaborate with architecture, software, and product teams to guide the development of next-generation deep learning hardware and software.
• Design, develop, and optimize major layers in LLM (e.g attention, GEMM, inter-GPU communication) for NVIDIA's new architectures. • Implement and fine-tune kernels to achieve optimal performance on NVIDIA GPUs. • Conduct in-depth performance analysis of GPU kernels, including Attention and other critical operations. • Identify bottlenecks, optimize resource utilization, and improve throughput, and power efficiency • Create and maintain workloads and micro-benchmark suites to evaluate kernel performance across various hardware and software configurations. • Generate performance projections, comparisons, and detailed analysis reports for internal and external stakeholders. • Collaborate with architecture, software, and product teams to guide the development of next-generation deep learning hardware and software.
• Architect Performance Tooling: Develop infrastructure tools/libraries for GPU performance analysis, visualization, and automated workflows used across GPU SW/HW development life cycle. • Unlock Architectural Insights: Analyze GPU workloads to identify bottlenecks and define new hardware profiling features that enhance perf debug and profiling capabilities. • AI-Powered Automation: Build AI/ML-driven tools to automate performance analysis, generate perf optimization guidance, and improve user experience of profiling infrastructure. • Cross-Stack Collaboration: Partner with kernel developers, system software teams, and hardware architects to support performance study, improve CUDA software stack, and co-design performance-centric solutions for current and next-generation GPU architecture
NVIDIA is developing processor and system architectures that accelerate deep learning and high-performance computing applications. We are looking for an intern deep learning system performance architect to join our AI performance modelling, analysis and optimization efforts. In this position, you will have a chance to work on DL performance modelling, analysis, and optimization on state-of-the-art hardware architectures for various LLM workloads. You will make your contributions to our dynamic technology focused company. What you’ll be doing: • Analyze state of the art DL networks (LLM etc.), identify and prototype performance opportunities to influence SW and Architecture team for NVIDIA's current and next gen inference products. • Develop analytical models for the state of the art deep learning networks and algorithm to innovate processor and system architectures design for performance and efficiency. • Specify hardware/software configurations and metrics to analyze performance, power, and accuracy in existing and future uni-processor and multiprocessor configurations. • Collaborate across the company to guide the direction of next-gen deep learning HW/SW by working with architecture, software, and product teams.
NVIDIA is developing processor and system architectures that accelerate deep learning and high-performance computing applications. We are looking for an expert deep learning system performance architect to join our AI performance modelling, analysis and optimization efforts. In this position, you will have a chance to work on DL performance modelling, analysis, and optimization on state-of-the-art hardware architectures for various LLM workloads. You will make your contributions to our dynamic technology focused company. What you’ll be doing: • Analyze state of the art DL networks (LLM etc.), identify and prototype performance opportunities to influence SW and Architecture team for NVIDIA's current and next gen inference products. • Develop analytical models for the state of the art deep learning networks and algorithm to innovate processor and system architectures design for performance and efficiency. • Specify hardware/software configurations and metrics to analyze performance, power, and accuracy in existing and future uni-processor and multiprocessor configurations. • Collaborate across the company to guide the direction of next-gen deep learning HW/SW by working with architecture, software, and product teams.