logo of nvidia

英伟达GPU Driver Profiler Engineer

社招全职地点:上海状态:招聘

任职要求


• B.S. EE/CS or equivalent experience with 2+ years of experience or M.S. with 1+ years' experience, or Ph.D.
• Strong programming ability in C, C++, and scripting languages.
• Quick learner, willing to dive in where needed and debug complex code and UMD/KMD interactions
• Driver experience (preferably kernel driver)




Ways to stand out from the crowd:

• CPU or GPU HW architecture knowledge
• Familiarity with power, performance, clock control within the kernel
• Knowledge of a GPU API such as: CUDA, OpenCL, OpenGL, OpenGL ES, DirectX, or console graphics API
• Good understanding of embedded environments such as embedded Linux, or a real-time OS

工作职责


• Revising/updating/testing kernel interfaces and reviewing code used by the Developer Tools team
• Collect requirements from software developer tools' features and work with the kernel team to co-design new interfaces
• Implementation of new features as well as HAL to support new GPU architectures
• Support various OS's and driver architectures: Windows WDDM, Linux Desktop, Mobile Linux and QNX.
• Contribute to next-gen architectures (both SW and HW)
包括英文材料
C+
内核+
CUDA+
OpenCL+
OpenGL+
ElasticSearch+
DirectX+
Linux+
相关职位

logo of amd
社招 Enginee

THE ROLE: Triton is a language and compiler for writing highly efficient custom deep learning primitives. It's widely adopted in open AI software stack projects like PyTorch, vLLM, SGLang, and many others. AMD GPU is an official backend in Triton and we are fully committed to it. If you are interested in making GPUs running fast via developing the Triton compiler and kernels, please come join us!

更新于 2025-10-06
logo of nvidia
社招

N/A

更新于 2025-08-29
logo of nvidia
实习

We are now looking for a Performance Engineer Intern to support our growing investments in perf testing of various company datacenter products and applications. Today, NVIDIA is tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world, all while striving to deliver the highest possible performance of our products.You will be part of global Performance Lab team, improving our capacity to expertly and accurately benchmark state-of-the-art datacenter applications and products. We also work to develop new scripts that enhance the team’s ability to gather data through automation and designing efficient processes for testing a wide variety of applications and hardware. The data that we collect drives marketing/sales collaterals as well as engineering studies for current and future products. You will have the opportunity to work with multi-functional teams and in a dynamic environment where multiple projects will be active at once and priorities may shift frequently. What you’ll be doing: • Benchmark, profile, and analyze the performance of AI workloads specifically tailored for large-scale LLM training and inference, as well as High-Performance Computing (HPC) on NVIDIA supercomputers and distributed systems. • Aggregate and produce written and visual reports with the testing data for internal sales, marketing, SW, and HW teams • Setup and configure systems with appropriate hardware and software to run benchmarks • Collaborate with internal teams to debug and improve performance issues • Develop Python scripts to automate the testing of various applications • Assist with the development of tools and processes that improve our ability to perform automated testing

更新于 2025-10-13
logo of amazon
社招Solution

*Hiring location: Beijing, Shanghai, Guangzhou, Shenzhen, Hong Kong(visa sponsorship provided) Would you like to join one of the fastest-growing teams within Amazon Web Services (AWS) and help shape the future of GPU optimization and high-performance computing? Join us in helping customers across all industries to maximize the performance and efficiency of their GPU workloads on AWS while pioneering innovative optimization solutions. As a Senior Technical Account Manager (Sr. TAM) specializing in GPU Optimization in AWS Enterprise Support, you will play a crucial role in two key missions: guiding customers' GPU acceleration initiatives across AWS's comprehensive compute portfolio, and spearheading the development of optimization strategies that revolutionize customer workload performance. Key Job Responsibilities - Build and maintain long-term technical relationships with enterprise customers, focusing on GPU performance optimization and resource allocation efficiency on AWS cloud or similar cloud services. - Analyze customers’ current architecture, models, data pipelines, and deployment patterns; create a GPU bottleneck map and measurable KPIs (e.g., GPU utilization, throughput, P95/P99 latency, cost per unit). - Design and optimize GPU resource usage on EC2/EKS/SageMaker or equivalent cloud compute, container, and ML services; implement node pool tiering, Karpenter/Cluster Autoscaler tuning, auto scaling, and cost governance (Savings Plans/RI/Spot/ODCR or equivalent). - Drive GPU partitioning and multi-tenant resource sharing strategies to reduce idle resources and increase overall cluster utilization. - Guide customers in PyTorch/TensorFlow performance tuning (DataLoader optimization, mixed precision, gradient accumulation, operator fusion, torch.compile) and inference acceleration (ONNX, TensorRT, CUDA Graphs, model compression). - Build GPU observability and monitoring systems (nvidia-smi, CloudWatch or equivalent monitoring tools, profilers, distributed communication metrics) to align capacity planning with SLOs. - Ensure compatibility across GPU drivers, CUDA, container runtimes, and frameworks; standardize change management and rollback processes. - Collaborate with cloud provider internal teams and external partners (NVIDIA, ISVs) to resolve cross-domain complex issues and deliver repeatable optimization solutions. ------------------------------------------------------

更新于 2025-08-18