英伟达Deep Learning Performance Architect - Intern - 2026

实习兼职2025-11-10地点：上海状态：招聘

扫码手机上打开

任职要求

• BS or higher degree in a relevant technical field (CS, EE, CE, Math, etc.).
• Strong programming skills in Python, C, C++. 
• Strong background in computer architecture.
• Experience with performance modeling, architecture simulation, profiling, and analysis.
• Prior experience with LLM or generative AI algorithms.



Ways to stand out from the crowd:
• GPU Computing and parallel programming models such as CUDA and OpenCL.
• Architecture of or workload analysis on other deep learning accelerators.
• Deep neural network training, inference and optimization in …

登录查看完整任职要求

微信扫码，1秒登录

工作职责

NVIDIA is developing processor and system architectures that accelerate deep learning and high-performance computing applications. We are looking for an intern deep learning system performance architect to join our AI performance modelling, analysis and optimization efforts. In this position, you will have a chance to work on DL performance modelling, analysis, and optimization on state-of-the-art hardware architectures for various LLM workloads. You will make your contributions to our dynamic technology focused company.
What you’ll be doing:
• Analyze state of the art DL networks (LLM etc.), identify and prototype performance opportunities to influence SW and Architecture team for NVIDIA&#39;s current and next gen inference products.
• Develop analytical models for the state of the art deep learning networks and algorithm to innovate processor and system architectures design for performance and efficiency.
• Specify hardware/software configurations and metrics to analyze performance, power, and accuracy in existing and future uni-processor and multiprocessor configurations.
• Collaborate across the company to guide the direction of next-gen deep learning HW/SW by working with architecture, software, and product teams.

📮 投递简历 ✨AI模拟面试

难度：

包括英文材料

Python+

C+

大模型+

CUDA+

还有更多 •••

登录查看完整学习资料

相关职位

SoC Performance Architect Intern - 2026

实习

• Design, implement and improve performance simulators and models to support the real-world use case study. • Develop and maintain software infrastructures for capturing, replaying and profiling complex application workloads running on our powerful SoC platform. • Conduct quantitative studies on the current and next-generation SoC architectures to evaluate performance across various use cases. • Investigate performance bottlenecks and propose architectural ideas to improve overall system and application performance.

更新于 2025-12-18上海

AI Computing Performance Architect Intern, Perf Analysis and Kernel Dev - 2026

实习

• Design, develop, and optimize major layers in LLM (e.g attention, GEMM, inter-GPU communication) for NVIDIA's new architectures. • Implement and fine-tune kernels to achieve optimal performance on NVIDIA GPUs. • Conduct in-depth performance analysis of GPU kernels, including Attention and other critical operations. • Identify bottlenecks, optimize resource utilization, and improve throughput, and power efficiency • Create and maintain workloads and micro-benchmark suites to evaluate kernel performance across various hardware and software configurations. • Generate performance projections, comparisons, and detailed analysis reports for internal and external stakeholders. • Collaborate with architecture, software, and product teams to guide the development of next-generation deep learning hardware and software.

更新于 2026-01-20上海|北京

AI Computing Performance Architect, Perf Analysis and Kernel Dev

社招

更新于 2025-09-03上海

Deep Learning Performance Architect

社招

We are now looking for a Deep Learning Performance Software Engineer! We are expanding our research and development for Inference. We seek excellent Software Engineers and Senior Software Engineers to join our team.We specialize in developing GPU-accelerated Deep learning software. Researchers around the world are using NVIDIA GPUs to power a revolution in deep learning, enabling breakthroughs in numerous areas. Join the team that builds software to enable new solutions. Collaborate with the deep learning community to implement the latest algorithms for public release in Tensor-RT. Your ability to work in a fast-paced customer-oriented team is required and excellent communication skills are necessary. What you’ll be doing: • Develop highly optimized deep learning kernels for inference • Do performance optimization, analysis, and tuning • Work with cross-collaborative teams across automotive, image understanding, and speech understanding to develop innovative solutions • Occasionally travel to conferences and customers for technical consultation and training

更新于 2025-09-23上海|北京