英伟达GPU Workload Analysis Intern - 2026
任职要求
• Good communication and problem analysis ability • Shown knowledge of DL algorithms • Experience of training and fine-tuning model • Experience of building and improving …
工作职责
GPU System Architect team’s work scope covers whole GPU pipeline(graphics, compute pipeline, memory system) and multi GPU, CPU and CPU interconnection, which provides good opportunity to deeply learn the latest cross unit new features in the new GPU architectures. The team works as the safety net of the chip. We catch function bugs in the HW by randomly generating tests and running them in various pre-silicon full chip platforms and debugging the failures. This works provides a good full chip view of GPU and has a big space to innovate. What you’ll be doing: • Get familiar with various GPU workload’s composition • Learn about what’s the usual feature metrics for GPU workload • Design and implement inventive solution to efficiently extract features from GPU workload • Verify the solution using direct and random GPU workload • Design and implement inventive solution simplify GPU workload while keeping the required features • Design and implement inventive solution to generate GPU workload according to required features • Design and implement inventive solution to generate GPU workload which has the same feature with a given test and randomize other (required) features • Thoroughly verify the solution on GPU functional simulator/full chip RTL/emulation/silicon platform. • Provide detailed and organized documentation and report out for the project.
NVIDIA is developing processor and system architectures that accelerate deep learning and high-performance computing applications. We are looking for an intern deep learning system performance architect to join our AI performance modelling, analysis and optimization efforts. In this position, you will have a chance to work on DL performance modelling, analysis, and optimization on state-of-the-art hardware architectures for various LLM workloads. You will make your contributions to our dynamic technology focused company. What you’ll be doing: • Analyze state of the art DL networks (LLM etc.), identify and prototype performance opportunities to influence SW and Architecture team for NVIDIA's current and next gen inference products. • Develop analytical models for the state of the art deep learning networks and algorithm to innovate processor and system architectures design for performance and efficiency. • Specify hardware/software configurations and metrics to analyze performance, power, and accuracy in existing and future uni-processor and multiprocessor configurations. • Collaborate across the company to guide the direction of next-gen deep learning HW/SW by working with architecture, software, and product teams.
• Design, develop, and optimize major layers in LLM (e.g attention, GEMM, inter-GPU communication) for NVIDIA's new architectures. • Implement and fine-tune kernels to achieve optimal performance on NVIDIA GPUs. • Conduct in-depth performance analysis of GPU kernels, including Attention and other critical operations. • Identify bottlenecks, optimize resource utilization, and improve throughput, and power efficiency • Create and maintain workloads and micro-benchmark suites to evaluate kernel performance across various hardware and software configurations. • Generate performance projections, comparisons, and detailed analysis reports for internal and external stakeholders. • Collaborate with architecture, software, and product teams to guide the development of next-generation deep learning hardware and software.
• Use internally developed tools and industry standard pre-silicon gate-level and RTL power analysis tools, to help improve product power efficiency. • Develop and share best practices for performing pre-silicon power analysis, Enhance internal power tools and automate best practices • Perform comparative power analysis, to spot trends and anomalies, that warrant more scrutiny. • Interact with architects and RTL designers to help them interpret their power data and identify power bugs; drive them to implement fixes. • Select and run a wide variety of workloads for power analysis, Collaborate with performance and architecture teams to validate performance of the workloads • Prototype a new architectural feature in Verilog and analyze power.
An exciting internship opportunity to make an immediate contribution to AMD's next generation of technology innovations awaits you! We have a multifaceted, high-energy work environment filled with a diverse group of employees, and we provide outstanding opportunities for developing your career. During your internship, our programs provide the opportunity to collaborate with AMD leaders, receive one-on-one mentorship, attend amazing networking events, and much more. Being part of AMD means receiving hands-on experience that will give you a competitive edge. Together We Advance your career! JOB DETAILS: Location: Shanghai, China Onsite/Hybrid: This role require the student to work at least 3 days/week, either in a hybrid (minimum 3 Days in Office) or onsite work structure throughout the duration of the co-op/intern term. Duration: Jan - June 2026 WHAT YOU WILL BE DOING: We are seeking a highly motivated Machine Learning (ML)/Artificial Intelligence (AI) intern/co-op to join our team and contribute to the development of next-generation product differentiation features alongside expert ML/AI engineers. In this role, you will: Gain hands-on experience with cutting-edge technologies in ML, AI, and High-Performance Computing. Learn to analyze and optimize GPU Kernel to maximize performance for specific AI operations. Contribute to projects such as: Researching, developing, and deploying machine learning and computer vision solutions for AMD's current and future products. Work closely with internal teams to analyze and improve training and inference performance on AMD GPUs. Design and optimize deep learning models specifically for AMD GPU performance. Assisting AI software teams with roadmap planning, collateral development, and customer engagements. Engage with framework maintainers to ensure code changes are aligned with requirements and integrated upstream. Apply sound engineering principles to ensure robust, maintainable solutions.