logo of nvidia

英伟达GPU Workload Analysis Intern - 2026

实习兼职地点:上海状态:招聘

任职要求


• Good communication and problem analysis ability
• Shown knowledge of DL algorithms
• Experience of training and fine-tuning model
• Experience of building and improving …
登录查看完整任职要求
微信扫码,1秒登录

工作职责


GPU System Architect team’s work scope covers whole GPU pipeline(graphics, compute pipeline, memory system) and multi GPU, CPU and CPU interconnection, which provides good opportunity to deeply learn the latest cross unit new features in the new GPU architectures. The team works as the safety net of the chip. We catch function bugs in the HW by randomly generating tests and running them in various pre-silicon full chip platforms and debugging the failures. This works provides a good full chip view of GPU and has a big space to innovate.



What you’ll be doing:

• Get familiar with various GPU workload’s composition
• Learn about what’s the usual feature metrics for GPU workload
• Design and implement inventive solution to efficiently extract features from GPU workload
• Verify the solution using direct and random GPU workload
• Design and implement inventive solution simplify GPU workload while keeping the required features
• Design and implement inventive solution to generate GPU workload according to required features
• Design and implement inventive solution to generate GPU workload which has the same feature with a given test and randomize other (required) features
• Thoroughly verify the solution on GPU functional simulator/full chip RTL/emulation/silicon platform.
• Provide detailed and organized documentation and report out for the project.
包括英文材料
AI agent+
相关职位

logo of nvidia
实习

NVIDIA is developing processor and system architectures that accelerate deep learning and high-performance computing applications. We are looking for an intern deep learning system performance architect to join our AI performance modelling, analysis and optimization efforts. In this position, you will have a chance to work on DL performance modelling, analysis, and optimization on state-of-the-art hardware architectures for various LLM workloads. You will make your contributions to our dynamic technology focused company. What you’ll be doing: • Analyze state of the art DL networks (LLM etc.), identify and prototype performance opportunities to influence SW and Architecture team for NVIDIA's current and next gen inference products. • Develop analytical models for the state of the art deep learning networks and algorithm to innovate processor and system architectures design for performance and efficiency. • Specify hardware/software configurations and metrics to analyze performance, power, and accuracy in existing and future uni-processor and multiprocessor configurations. • Collaborate across the company to guide the direction of next-gen deep learning HW/SW by working with architecture, software, and product teams.

更新于 2025-11-10上海
logo of nvidia
实习

• Design, develop, and optimize major layers in LLM (e.g attention, GEMM, inter-GPU communication) for NVIDIA's new architectures. • Implement and fine-tune kernels to achieve optimal performance on NVIDIA GPUs. • Conduct in-depth performance analysis of GPU kernels, including Attention and other critical operations. • Identify bottlenecks, optimize resource utilization, and improve throughput, and power efficiency • Create and maintain workloads and micro-benchmark suites to evaluate kernel performance across various hardware and software configurations. • Generate performance projections, comparisons, and detailed analysis reports for internal and external stakeholders. • Collaborate with architecture, software, and product teams to guide the development of next-generation deep learning hardware and software.

更新于 2026-01-20上海|北京
logo of nvidia
实习

• Use internally developed tools and industry standard pre-silicon gate-level and RTL power analysis tools, to help improve product power efficiency. • Develop and share best practices for performing pre-silicon power analysis, Enhance internal power tools and automate best practices • Perform comparative power analysis, to spot trends and anomalies, that warrant more scrutiny. • Interact with architects and RTL designers to help them interpret their power data and identify power bugs; drive them to implement fixes. • Select and run a wide variety of workloads for power analysis, Collaborate with performance and architecture teams to validate performance of the workloads • Prototype a new architectural feature in Verilog and analyze power.

更新于 2025-10-30上海
logo of nvidia
实习

We are now looking for a Performance Engineer Intern to support our growing investments in perf testing of various company datacenter products and applications. Today, NVIDIA is tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world, all while striving to deliver the highest possible performance of our products.You will be part of global Performance Lab team, improving our capacity to expertly and accurately benchmark state-of-the-art datacenter applications and products. We also work to develop infrastructures and solutions that enhance the team’s ability to gather data through automation and designing efficient processes for testing a wide variety of applications and hardware. The data that we collect drives marketing/sales collaterals as well as engineering studies for future products. You will have the opportunity to work with multi-functional teams and in a dynamic environment where multiple projects will be active at once and priorities may shift frequently. What you’ll be doing: • Benchmark, profile, and analyze the performance of AI workloads specifically tailored for large-scale LLM training and inference, as well as High-Performance Computing (HPC) on NVIDIA supercomputers and distributed systems. • Aggregate and produce written reports with the testing data for internal sales, marketing, SW, and HW teams. • Develop Python scripts to automate the testing of various applications. • Collaborate with internal teams to debug and improve performance issues. • Assist with the development of tools and processes that improve our ability to perform automated testing. • Setup and configure systems with appropriate hardware and software to run benchmarks.

更新于 2025-11-19上海