英伟达AI Computing Performance Architect

社招全职2026-03-18地点：上海状态：招聘

扫码手机上打开

任职要求

• An MS or PhD in a relevant field like Computer Science, Electrical Engineering, or Mathematics.
• At least 3 years of professional experience with performance modeling, analysis, and code optimization for deep learning operators on GPU, CPU, or LPU—including hands-on assembly or SIMD programming.
• Solid foundation in computer architecture.
• Proficiency in programming languages such as C, C++, Perl, or Python.

Ways to stand out from the crowd:
• You’re knowledgeable about LLM frameworks and their fundamentals.
• Experience with parallel programming and CUDA or Open…

登录查看完整任职要求

微信扫码，1秒登录

工作职责

• Analyze the performance of a wide range of machine learning and deep learning algorithms across existing and emerging architectures.
• Identify bottlenecks and devise creative software solutions or recommend improvements in GPU architectures.
• Explore and evaluate how hardware and software architectures interact with future algorithms and applications.

📮 投递简历 ✨AI模拟面试

难度：

包括英文材料

Assembly+

C+

Perl+

还有更多 •••

登录查看完整学习资料

相关职位

AI Computing Performance Architect, Perf Analysis and Kernel Dev

社招

• Design, develop, and optimize major layers in LLM (e.g attention, GEMM, inter-GPU communication) for NVIDIA's new architectures. • Implement and fine-tune kernels to achieve optimal performance on NVIDIA GPUs. • Conduct in-depth performance analysis of GPU kernels, including Attention and other critical operations. • Identify bottlenecks, optimize resource utilization, and improve throughput, and power efficiency • Create and maintain workloads and micro-benchmark suites to evaluate kernel performance across various hardware and software configurations. • Generate performance projections, comparisons, and detailed analysis reports for internal and external stakeholders. • Collaborate with architecture, software, and product teams to guide the development of next-generation deep learning hardware and software.

更新于 2025-09-03上海

AI Computing Performance Architect Intern, Perf Analysis and Kernel Dev - 2026

实习

更新于 2026-01-20上海|北京

Deep Learning Performance Architect - Intern - 2026

实习

NVIDIA is developing processor and system architectures that accelerate deep learning and high-performance computing applications. We are looking for an intern deep learning system performance architect to join our AI performance modelling, analysis and optimization efforts. In this position, you will have a chance to work on DL performance modelling, analysis, and optimization on state-of-the-art hardware architectures for various LLM workloads. You will make your contributions to our dynamic technology focused company. What you’ll be doing: • Analyze state of the art DL networks (LLM etc.), identify and prototype performance opportunities to influence SW and Architecture team for NVIDIA's current and next gen inference products. • Develop analytical models for the state of the art deep learning networks and algorithm to innovate processor and system architectures design for performance and efficiency. • Specify hardware/software configurations and metrics to analyze performance, power, and accuracy in existing and future uni-processor and multiprocessor configurations. • Collaborate across the company to guide the direction of next-gen deep learning HW/SW by working with architecture, software, and product teams.

更新于 2025-11-10上海

Deep Learning Performance Architect - New College Grad 2026

社招

NVIDIA is developing processor and system architectures that accelerate deep learning and high-performance computing applications. We are looking for an expert deep learning system performance architect to join our AI performance modelling, analysis and optimization efforts. In this position, you will have a chance to work on DL performance modelling, analysis, and optimization on state-of-the-art hardware architectures for various LLM workloads. You will make your contributions to our dynamic technology focused company. What you’ll be doing: • Analyze state of the art DL networks (LLM etc.), identify and prototype performance opportunities to influence SW and Architecture team for NVIDIA's current and next gen inference products. • Develop analytical models for the state of the art deep learning networks and algorithm to innovate processor and system architectures design for performance and efficiency. • Specify hardware/software configurations and metrics to analyze performance, power, and accuracy in existing and future uni-processor and multiprocessor configurations. • Collaborate across the company to guide the direction of next-gen deep learning HW/SW by working with architecture, software, and product teams.

更新于 2025-11-19上海|北京