英伟达GPU Workload Analysis Intern - 2026
任职要求
• Good communication and problem analysis ability • Shown knowledge of DL algorithms • Experience of training and fine-tuning model • Experience of building and improving …
工作职责
GPU System Architect team’s work scope covers whole GPU pipeline(graphics, compute pipeline, memory system) and multi GPU, CPU and CPU interconnection, which provides good opportunity to deeply learn the latest cross unit new features in the new GPU architectures. The team works as the safety net of the chip. We catch function bugs in the HW by randomly generating tests and running them in various pre-silicon full chip platforms and debugging the failures. This works provides a good full chip view of GPU and has a big space to innovate. What you’ll be doing: • Get familiar with various GPU workload’s composition • Learn about what’s the usual feature metrics for GPU workload • Design and implement inventive solution to efficiently extract features from GPU workload • Verify the solution using direct and random GPU workload • Design and implement inventive solution simplify GPU workload while keeping the required features • Design and implement inventive solution to generate GPU workload according to required features • Design and implement inventive solution to generate GPU workload which has the same feature with a given test and randomize other (required) features • Thoroughly verify the solution on GPU functional simulator/full chip RTL/emulation/silicon platform. • Provide detailed and organized documentation and report out for the project.
NVIDIA is developing processor and system architectures that accelerate deep learning and high-performance computing applications. We are looking for an intern deep learning system performance architect to join our AI performance modelling, analysis and optimization efforts. In this position, you will have a chance to work on DL performance modelling, analysis, and optimization on state-of-the-art hardware architectures for various LLM workloads. You will make your contributions to our dynamic technology focused company. What you’ll be doing: • Analyze state of the art DL networks (LLM etc.), identify and prototype performance opportunities to influence SW and Architecture team for NVIDIA's current and next gen inference products. • Develop analytical models for the state of the art deep learning networks and algorithm to innovate processor and system architectures design for performance and efficiency. • Specify hardware/software configurations and metrics to analyze performance, power, and accuracy in existing and future uni-processor and multiprocessor configurations. • Collaborate across the company to guide the direction of next-gen deep learning HW/SW by working with architecture, software, and product teams.
• Use internally developed tools and industry standard pre-silicon gate-level and RTL power analysis tools, to help improve product power efficiency. • Develop and share best practices for performing pre-silicon power analysis, Enhance internal power tools and automate best practices • Perform comparative power analysis, to spot trends and anomalies, that warrant more scrutiny. • Interact with architects and RTL designers to help them interpret their power data and identify power bugs; drive them to implement fixes. • Select and run a wide variety of workloads for power analysis, Collaborate with performance and architecture teams to validate performance of the workloads • Prototype a new architectural feature in Verilog and analyze power.
We are now looking for a Performance Engineer Intern to support our growing investments in perf testing of various company datacenter products and applications. Today, NVIDIA is tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world, all while striving to deliver the highest possible performance of our products.You will be part of global Performance Lab team, improving our capacity to expertly and accurately benchmark state-of-the-art datacenter applications and products. We also work to develop infrastructures and solutions that enhance the team’s ability to gather data through automation and designing efficient processes for testing a wide variety of applications and hardware. The data that we collect drives marketing/sales collaterals as well as engineering studies for future products. You will have the opportunity to work with multi-functional teams and in a dynamic environment where multiple projects will be active at once and priorities may shift frequently. What you’ll be doing: • Benchmark, profile, and analyze the performance of AI workloads specifically tailored for large-scale LLM training and inference, as well as High-Performance Computing (HPC) on NVIDIA supercomputers and distributed systems. • Aggregate and produce written reports with the testing data for internal sales, marketing, SW, and HW teams. • Develop Python scripts to automate the testing of various applications. • Collaborate with internal teams to debug and improve performance issues. • Assist with the development of tools and processes that improve our ability to perform automated testing. • Setup and configure systems with appropriate hardware and software to run benchmarks.
An exciting internship opportunity to make an immediate contribution to AMD's next generation of technology innovations awaits you! We have a multifaceted, high-energy work environment filled with a diverse group of employees, and we provide outstanding opportunities for developing your career. During your internship, our programs provide the opportunity to collaborate with AMD leaders, receive one-on-one mentorship, attend amazing networking events, and much more. Being part of AMD means receiving hands-on experience that will give you a competitive edge. Together We Advance your career! JOB DETAILS: Location: Shanghai, China Onsite/Hybrid: This role require the student to work at least 3 days/week, either in a hybrid (minimum 3 Days in Office) or onsite work structure throughout the duration of the co-op/intern term. Duration: Jan - June 2026 WHAT YOU WILL BE DOING: We are seeking a highly motivated Machine Learning (ML)/Artificial Intelligence (AI) intern/co-op to join our team and contribute to the development of next-generation product differentiation features alongside expert ML/AI engineers. In this role, you will: Gain hands-on experience with cutting-edge technologies in ML, AI, and High-Performance Computing. Learn to analyze and optimize GPU Kernel to maximize performance for specific AI operations. Contribute to projects such as: Researching, developing, and deploying machine learning and computer vision solutions for AMD's current and future products. Work closely with internal teams to analyze and improve training and inference performance on AMD GPUs. Design and optimize deep learning models specifically for AMD GPU performance. Assisting AI software teams with roadmap planning, collateral development, and customer engagements. Engage with framework maintainers to ensure code changes are aligned with requirements and integrated upstream. Apply sound engineering principles to ensure robust, maintainable solutions.