英伟达InfiniBand Intern - 2026
任职要求
• Master student or Ph.D student of computing science or networking
• Familiar with C++ and Python programming langua…工作职责
Today, NVIDIA is tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. As an NVIDIAN, you’ll be immersed in a diverse, encouraging environment where everyone is inspired to do their best work. Come join the team and see how we can make a lasting impact on the world. The network defines the data center. The state-of-art AI networking technologies and In-Network computing technologies can significantly improve the application performance and scalability in AI cloud. NVIDIA Quantum-X InfiniBand platform and Spectrum-X Ethernet platform are the major AI network platform in the market. What you’ll be doing: • Work with NVIDIA network research team to define the communication algorithm architecture of GPU and CPU cluster. • Leverage NCCL and new advanced communication technologies to get better GPU communication performance. • Work on the new network collective communication optimization
NVIDIA networking designs and manufactures high-performance networking equipment that enable the most powerful super computers in the largest data centers in the world. With a distributed collection of NVIDIA GPUs inter-connected by networking solutions such as InfiniBand, Ethernet, or RoCE (RDMA over Converged Ethernet) we make powerful ML/AI platforms possible. We are seeking motivated, personable, and independent individuals to join our team!We seek experienced software embedded engineers to help support our groundbreaking, innovative technologies that make AI workloads in large clusters even more performant. As a networking Sr. Solutions Architect at NVIDIA you will have agency and palpable effects on the business, and work closely with customers and R&D teams. What you’ll be doing: • Support networking technologies such as Spectrum-X and work with customers on their technical challenges and requirements using said technologies during pre-sales activities • Develop proof-of-concept materials for innovative technologies for use by early adopters • Gain customers’ trust and understand their needs to help design and deploy groundbreaking NVIDIA networking platforms to run AI and HPC workloads • Address sophisticated and highly visible customer issues • Work closely with R&D teams to develop new features for customers • Help with product requirements alongside engineering and product marketing
• Primary responsibilities will include building AI/HPC infrastructure for new and existing customers. • Support operational and reliability aspects of large-scale AI clusters, focusing on performance at scale, real-time monitoring, logging, and alerting. • Engage in and improve the whole lifecycle of services—from inception and design through deployment, operation, and refinement. • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health. • Provide feedback to internal teams such as opening bugs, documenting workarounds, and suggesting improvements.

● 负责超算集群装机、运维及生命周期管理 ● 固件 / 驱动调优(H20/H200 GPU + mlx5_core NIC) ● 并行文件系统运维与优化 ● NCCL/UCX Profiling 与性能调试 ● 参与 24×7 值班及 P1 故障响应