logo of nvidia

英伟达Senior HPC and AI Networking Performance Research and Analysis Engineer

社招全职地点:上海 | 广州 | 北京 | 深圳状态:招聘

任职要求


• B.Sc in Computer Science or Software Engineering
• 8+ years of experience with high-performance Networking (RDMA, MPI, NCCL) 
• Demonstrated Performance Analysis skills and methodologies.
• Experience with NVIDIA GPUs, CUDA library, deep learning frameworks like TensorFlow or PyTorch,combined with expertise in networking collective communication libraries (such as NCCL) and protocols (such as RoCE and RDMA).
• Fast and self-learning capabilities with strong analytical and problem solving skills
• Programming Languages: Python, Bash and C languages
• Experience with Linux OS distros
• Team player with good communication and interpersonal skills

Ways to stand out from the crowd:
• In-depth kno…
登录查看完整任职要求
微信扫码,1秒登录

工作职责


• Experience and research AI workloads and DL models specifically tailored for large-scale deep learning LLM training on NVIDIA supercomputers with a focus on High-performance networking.
• Benchmarking, Profiling, and Analyzing the performance to find bottlenecks and identify areas of improvement and optimizations, with a strong emphasis on networking aspects.
• Implement performance analysis tools.
• Collaborating with many teams from HW to SW to provide performance analysis insights.
• Define performance test planning, set performance expectations for new technologies and solutions, and work to reach the performance targets limits.
包括英文材料
Message Passing Interface+
NCCL+
CUDA+
TensorFlow+
PyTorch+
还有更多 •••
相关职位

logo of nvidia
社招

• Primary responsibilities will include building AI/HPC infrastructure for new and existing customers. • Support operational and reliability aspects of large-scale AI clusters, focusing on performance at scale, real-time monitoring, logging, and alerting. • Engage in and improve the whole lifecycle of services—from inception and design through deployment, operation, and refinement. • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health. • Provide feedback to internal teams such as opening bugs, documenting workarounds, and suggesting improvements.

更新于 2025-09-29北京
logo of nvidia
社招

• Design and prototype scalable software systems that optimize distributed AI training and inference—focusing on throughput, latency, and memory efficiency.  • Develop and evaluate enhancements to communication libraries such as NCCL , UCX , and UCC , tailored to the unique demands of deep learning workloads.  • Collaborate with AI framework teams (e.g., TensorFlow, PyTorch, JAX) to improve integration, performance, and reliability of communication backends.  • Co-design hardware features (e.g., in GPUs, DPUs, or interconnects) that accelerate data movement and enable new capabilities for inference and model serving.  • Contribute to the evolution of runtime systems, communication libraries, and AI-specific protocol layers.  • Collaborate with customers to understand their needs and provide innovative solutions for them.

更新于 2025-10-20北京
logo of nvidia
社招

NVIDIA networking designs and manufactures high-performance networking equipment that enable the most powerful super computers in the largest data centers in the world. With a distributed collection of NVIDIA GPUs inter-connected by networking solutions such as InfiniBand, Ethernet, or RoCE (RDMA over Converged Ethernet) we make powerful ML/AI platforms possible. We are seeking motivated, personable, and independent individuals to join our team!We seek experienced software embedded engineers to help support our groundbreaking, innovative technologies that make AI workloads in large clusters even more performant. As a networking Sr. Solutions Architect at NVIDIA you will have agency and palpable effects on the business, and work closely with customers and R&D teams. What you’ll be doing: • Support networking technologies such as Spectrum-X and work with customers on their technical challenges and requirements using said technologies during pre-sales activities • Develop proof-of-concept materials for innovative technologies for use by early adopters • Gain customers’ trust and understand their needs to help design and deploy groundbreaking NVIDIA networking platforms to run AI and HPC workloads • Address sophisticated and highly visible customer issues • Work closely with R&D teams to develop new features for customers • Help with product requirements alongside engineering and product marketing

更新于 2025-06-15北京
logo of nvidia
社招

• Primary responsibilities will include deploying, managing and maintaining AI/HPC infrastructure in Linux-based environments for new and existing customers. • Be the domain expert with customers during planning calls through implementation. • Handover-related documentation and perform knowledge transfers required to support customers as they begin rolling out some of the most sophisticated systems in the world! • Provide feedback into internal teams such as opening bugs, documenting workarounds, and suggesting improvements.

更新于 2025-09-15北京|上海|深圳