英伟达GPU Driver Profiler Engineer
任职要求
• B.S. EE/CS or equivalent experience with 2+ years of experience or M.S. with 1+ years' experience, or Ph.D. • Strong programming ability in C, C++, and scripting languages. • Quick learner, willing to dive in where needed and debug complex code and UMD/KMD interactions • Driver experience (preferably kernel driver…
工作职责
• Revising/updating/testing kernel interfaces and reviewing code used by the Developer Tools team • Collect requirements from software developer tools' features and work with the kernel team to co-design new interfaces • Implementation of new features as well as HAL to support new GPU architectures • Support various OS's and driver architectures: Windows WDDM, Linux Desktop, Mobile Linux and QNX. • Contribute to next-gen architectures (both SW and HW)
THE ROLE: Triton is a language and compiler for writing highly efficient custom deep learning primitives. It's widely adopted in open AI software stack projects like PyTorch, vLLM, SGLang, and many others. AMD GPU is an official backend in Triton and we are fully committed to it. If you are interested in making GPUs running fast via developing the Triton compiler and kernels, please come join us!
We are now looking for a Performance Engineer Intern to support our growing investments in perf testing of various company datacenter products and applications. Today, NVIDIA is tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world, all while striving to deliver the highest possible performance of our products.You will be part of global Performance Lab team, improving our capacity to expertly and accurately benchmark state-of-the-art datacenter applications and products. We also work to develop new scripts that enhance the team’s ability to gather data through automation and designing efficient processes for testing a wide variety of applications and hardware. The data that we collect drives marketing/sales collaterals as well as engineering studies for current and future products. You will have the opportunity to work with multi-functional teams and in a dynamic environment where multiple projects will be active at once and priorities may shift frequently. What you’ll be doing: • Benchmark, profile, and analyze the performance of AI workloads specifically tailored for large-scale LLM training and inference, as well as High-Performance Computing (HPC) on NVIDIA supercomputers and distributed systems. • Aggregate and produce written and visual reports with the testing data for internal sales, marketing, SW, and HW teams • Setup and configure systems with appropriate hardware and software to run benchmarks • Collaborate with internal teams to debug and improve performance issues • Develop Python scripts to automate the testing of various applications • Assist with the development of tools and processes that improve our ability to perform automated testing
We are now looking for a Performance Engineer Intern to support our growing investments in perf testing of various company datacenter products and applications. Today, NVIDIA is tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world, all while striving to deliver the highest possible performance of our products.You will be part of global Performance Lab team, improving our capacity to expertly and accurately benchmark state-of-the-art datacenter applications and products. We also work to develop infrastructures and solutions that enhance the team’s ability to gather data through automation and designing efficient processes for testing a wide variety of applications and hardware. The data that we collect drives marketing/sales collaterals as well as engineering studies for future products. You will have the opportunity to work with multi-functional teams and in a dynamic environment where multiple projects will be active at once and priorities may shift frequently. What you’ll be doing: • Benchmark, profile, and analyze the performance of AI workloads specifically tailored for large-scale LLM training and inference, as well as High-Performance Computing (HPC) on NVIDIA supercomputers and distributed systems. • Aggregate and produce written reports with the testing data for internal sales, marketing, SW, and HW teams. • Develop Python scripts to automate the testing of various applications. • Collaborate with internal teams to debug and improve performance issues. • Assist with the development of tools and processes that improve our ability to perform automated testing. • Setup and configure systems with appropriate hardware and software to run benchmarks.