英伟达AI Computing Performance Architect, Perf Analysis and Kernel Dev

社招全职2025-09-03地点：上海状态：招聘

扫码手机上打开

任职要求

• MS or PhD in relevant discipline (CS, EE, Math)
• 3+ years of industry experience in GPU programming or performance optimization for DL applications.
• Demonstrated experience in analyzing and improving the performance of GPU kernels, with measurable results (e.g. performance improvements, efficiency gains).
• Strong programming skills in C, C++, Perl, or Python
• Strong background in computer architecture
• Excel…

登录查看完整任职要求

微信扫码，1秒登录

工作职责

• Design, develop, and optimize major layers in LLM (e.g attention, GEMM, inter-GPU communication) for NVIDIA's new architectures.
• Implement and fine-tune kernels to achieve optimal performance on NVIDIA GPUs.
• Conduct in-depth performance analysis of GPU kernels, including Attention and other critical operations.
• Identify bottlenecks, optimize resource utilization, and improve throughput, and power efficiency
• Create and maintain workloads and micro-benchmark suites to evaluate kernel performance across various hardware and software configurations.
• Generate performance projections, comparisons, and detailed analysis reports for internal and external stakeholders.
• Collaborate with architecture, software, and product teams to guide the development of next-generation deep learning hardware and software.

📮 投递简历 ✨AI模拟面试

难度：

包括英文材料

C+

Perl+

还有更多 •••

登录查看完整学习资料

相关职位

AI Computing Performance Architect Intern, Perf Analysis and Kernel Dev - 2026

实习

• Design, develop, and optimize major layers in LLM (e.g attention, GEMM, inter-GPU communication) for NVIDIA's new architectures. • Implement and fine-tune kernels to achieve optimal performance on NVIDIA GPUs. • Conduct in-depth performance analysis of GPU kernels, including Attention and other critical operations. • Identify bottlenecks, optimize resource utilization, and improve throughput, and power efficiency • Create and maintain workloads and micro-benchmark suites to evaluate kernel performance across various hardware and software configurations. • Generate performance projections, comparisons, and detailed analysis reports for internal and external stakeholders. • Collaborate with architecture, software, and product teams to guide the development of next-generation deep learning hardware and software.

更新于 2026-01-20上海|北京

Deep Learning Performance Architect - Perf Tools

社招

• Architect Performance Tooling: Develop infrastructure tools/libraries for GPU performance analysis, visualization, and automated workflows used across GPU SW/HW development life cycle. • Unlock Architectural Insights: Analyze GPU workloads to identify bottlenecks and define new hardware profiling features that enhance perf debug and profiling capabilities. • AI-Powered Automation: Build AI/ML-driven tools to automate performance analysis, generate perf optimization guidance, and improve user experience of profiling infrastructure. • Cross-Stack Collaboration: Partner with kernel developers, system software teams, and hardware architects to support performance study, improve CUDA software stack, and co-design performance-centric solutions for current and next-generation GPU architecture

更新于 2025-12-24上海|北京

AI Computing Performance Architect

社招

• Analyze the performance of a wide range of machine learning and deep learning algorithms across existing and emerging architectures. • Identify bottlenecks and devise creative software solutions or recommend improvements in GPU architectures. • Explore and evaluate how hardware and software architectures interact with future algorithms and applications.

更新于 2026-03-18上海

Deep Learning Performance Architect - Intern - 2026

实习

NVIDIA is developing processor and system architectures that accelerate deep learning and high-performance computing applications. We are looking for an intern deep learning system performance architect to join our AI performance modelling, analysis and optimization efforts. In this position, you will have a chance to work on DL performance modelling, analysis, and optimization on state-of-the-art hardware architectures for various LLM workloads. You will make your contributions to our dynamic technology focused company. What you’ll be doing: • Analyze state of the art DL networks (LLM etc.), identify and prototype performance opportunities to influence SW and Architecture team for NVIDIA's current and next gen inference products. • Develop analytical models for the state of the art deep learning networks and algorithm to innovate processor and system architectures design for performance and efficiency. • Specify hardware/software configurations and metrics to analyze performance, power, and accuracy in existing and future uni-processor and multiprocessor configurations. • Collaborate across the company to guide the direction of next-gen deep learning HW/SW by working with architecture, software, and product teams.

更新于 2025-11-10上海