英伟达Accelerated Compute Systems Performance Architect Intern - 2026
任职要求
• Pursuing B.Sc., M.Sc., or Ph.D. in relevant discipline (CS, EE, CE). • A passion for performance analysis and optimization. • Hands-on experience with the massively parallel GPU programming model, e.g. CUDA or OpenCL. Familiarity with APIs for multi-node communication, like MPI or OpenSHMEM/NVSHMEM, is a plus. • Solid background in GPU …
工作职责
• Performing in-depth analysis and optimization to ensure the best possible performance on current and/or next-generation NVIDIA GPUs. • Understanding and analyzing the interplay of hardware and software architectures on core algorithms, programming models, and applications. • Actively collaborating with the hardware design, software engineering, product, and research teams to guide the direction of accelerated computing. • Diving into accelerated computing applications to facilitate software-hardware co-design. • Writing up and presenting your work by writing white papers, conference publications, official blog posts, patent applications, etc. as appropriate.
NVIDIA is developing processor and system architectures that accelerate deep learning and high-performance computing applications. We are looking for an intern deep learning system performance architect to join our AI performance modelling, analysis and optimization efforts. In this position, you will have a chance to work on DL performance modelling, analysis, and optimization on state-of-the-art hardware architectures for various LLM workloads. You will make your contributions to our dynamic technology focused company. What you’ll be doing: • Analyze state of the art DL networks (LLM etc.), identify and prototype performance opportunities to influence SW and Architecture team for NVIDIA's current and next gen inference products. • Develop analytical models for the state of the art deep learning networks and algorithm to innovate processor and system architectures design for performance and efficiency. • Specify hardware/software configurations and metrics to analyze performance, power, and accuracy in existing and future uni-processor and multiprocessor configurations. • Collaborate across the company to guide the direction of next-gen deep learning HW/SW by working with architecture, software, and product teams.
*Hiring location: Beijing, Shanghai, Guangzhou, Shenzhen, Hong Kong(visa sponsorship provided) Would you like to join one of the fastest-growing teams within Amazon Web Services (AWS) and help shape the future of GPU optimization and high-performance computing? Join us in helping customers across all industries to maximize the performance and efficiency of their GPU workloads on AWS while pioneering innovative optimization solutions. As a Senior Technical Account Manager (Sr. TAM) specializing in GPU Optimization in AWS Enterprise Support, you will play a crucial role in two key missions: guiding customers' GPU acceleration initiatives across AWS's comprehensive compute portfolio, and spearheading the development of optimization strategies that revolutionize customer workload performance. Key Job Responsibilities - Build and maintain long-term technical relationships with enterprise customers, focusing on GPU performance optimization and resource allocation efficiency on AWS cloud or similar cloud services. - Analyze customers’ current architecture, models, data pipelines, and deployment patterns; create a GPU bottleneck map and measurable KPIs (e.g., GPU utilization, throughput, P95/P99 latency, cost per unit). - Design and optimize GPU resource usage on EC2/EKS/SageMaker or equivalent cloud compute, container, and ML services; implement node pool tiering, Karpenter/Cluster Autoscaler tuning, auto scaling, and cost governance (Savings Plans/RI/Spot/ODCR or equivalent). - Drive GPU partitioning and multi-tenant resource sharing strategies to reduce idle resources and increase overall cluster utilization. - Guide customers in PyTorch/TensorFlow performance tuning (DataLoader optimization, mixed precision, gradient accumulation, operator fusion, torch.compile) and inference acceleration (ONNX, TensorRT, CUDA Graphs, model compression). - Build GPU observability and monitoring systems (nvidia-smi, CloudWatch or equivalent monitoring tools, profilers, distributed communication metrics) to align capacity planning with SLOs. - Ensure compatibility across GPU drivers, CUDA, container runtimes, and frameworks; standardize change management and rollback processes. - Collaborate with cloud provider internal teams and external partners (NVIDIA, ISVs) to resolve cross-domain complex issues and deliver repeatable optimization solutions. ------------------------------------------------------
NVIDIA Networking has been a leader in high performance networking infrastructure for many years. The next unit of computing is the datacenter, and the network makes it all possible. NVIDIA is looking to grow its networking architecture team with people passionate about accelerated computing. We are looking for you, a Senior Networking Architect, to develop the next generation of network for AI. If you love researching new technologies, we'd love to hear from you! What you’ll be doing: • The position includes an active part in research and development of networking solutions to enable the next generation of accelerated data centers. • The position spans over various layers from ASIC archtecture to algorithms, topologies and systems, and more. • Partner with internal teams, strategic customers and partners, standard bodies and networking communities to initiate and develop groundbreaking networking solutions for AI!
• Define the end-to-end technical architecture for the NIM Factory, from container build systems and CI/CD to Kubernetes deployment patterns and runtime optimization. • Drive technical strategy and roadmap, making high-impact decisions on frameworks, technologies, and standards that empower dozens of engineering teams. • Architect and influence the design of workflow orchestration systems that underpin the NIM factory. • Coach and mentor senior engineers across the organization, fostering a culture of technical excellence, innovation, and knowledge sharing. • Champion best practices in software development, including API design, automation, observability, and secure supply chain management. • Collaborate with leadership across research, backend, SRE, and product to align technical vision with product goals and influence technical roadmaps.