logo of nvidia

英伟达Senior Solutions Architect, InfiniBand and Networking Ethernet - NVIS

社招全职地点:北京状态:招聘

任职要求


• BS/MS/PhD or equivalent experience in Computer Science, Data Science, Electrical/Computer Engineering, Physics, Mathematics, other Engineering fields with at least 8 years work or research experience in networking fundamentals, TCP/IP stack, and data center architecture.
• 8+ years of experience with configuring, testing, validating, and issue resolution of LAN and InfiniBand networking, including use of validation tools for InfiniBand health and performance including medium to large scale HPC/AI network environments.
• Knowledge and experience with Linux system administration/dev ops, process management, package management, task scheduling, kernel management, boot procedures, troubleshooting, performance reporting/optimization/logging, and network-routing/advanced networking (tuning and monitoring).
• Driven focus on customer needs and satisfaction. Self-motivated with excellent leadership skills including working with customers.
• Extensive knowledge of automation, delivering fully automated network provisioning solutions using Ansible, Salt, and Python.
• Strong written, verbal, and listening skills in English are essential.

Ways to stand out from the crowd:
• Linux or Networking Certifications.
• Experience with High-performance computing architectures. Understanding of how job schedulers(Slurm, PBS) work.
• Proven knowledge of Python or Bash. Infrastructure Specialist's delivery experience
• Luster management technologies knowledge (bonus credit for BCM (Base Command Manager).)
• Experience with GPU (Graphics Processing Unit) focused hardware/software as well as experience with MPI (Message Passing Interface.)

工作职责


• Primary responsibilities will include building AI/HPC infrastructure for new and existing customers.
• Support operational and reliability aspects of large-scale AI clusters, focusing on performance at scale, real-time monitoring, logging, and alerting.
• Engage in and improve the whole lifecycle of services—from inception and design through deployment, operation, and refinement.
• Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
• Provide feedback to internal teams such as opening bugs, documenting workarounds, and suggesting improvements.
包括英文材料
TCP/IP+
HPC+
Linux+
内核+
Ansible+
Python+
Bash+
Message Passing Interface+
相关职位

logo of nvidia
社招

NVIDIA networking designs and manufactures high-performance networking equipment that enable the most powerful super computers in the largest data centers in the world. With a distributed collection of NVIDIA GPUs inter-connected by networking solutions such as InfiniBand, Ethernet, or RoCE (RDMA over Converged Ethernet) we make powerful ML/AI platforms possible. We are seeking motivated, personable, and independent individuals to join our team!We seek experienced software embedded engineers to help support our groundbreaking, innovative technologies that make AI workloads in large clusters even more performant. As a networking Sr. Solutions Architect at NVIDIA you will have agency and palpable effects on the business, and work closely with customers and R&D teams. What you’ll be doing: • Support networking technologies such as Spectrum-X and work with customers on their technical challenges and requirements using said technologies during pre-sales activities • Develop proof-of-concept materials for innovative technologies for use by early adopters • Gain customers’ trust and understand their needs to help design and deploy groundbreaking NVIDIA networking platforms to run AI and HPC workloads • Address sophisticated and highly visible customer issues • Work closely with R&D teams to develop new features for customers • Help with product requirements alongside engineering and product marketing

更新于 2025-06-15
logo of nvidia
社招

NVIDIA networking is a world-leader fast-growing company which supports the most powerful super computers and the largest data centers in the world. We make outstanding artificial intelligence happen with NVIDIA GPUs that accelerate the computing platform and networking solutions based on InfiniBand, Ethernet, or RoCE (RDMA over Converged Ethernet). We believe in our people and products and seek excellent people to join us!The Networking Solutions Architects team is looking for a hardworking, keen software networking engineer to join the team and support the Spectrum-X networking platform which is a revolutionary solution for building multi-tenant, hyperscale AI clouds with Ethernet. As a Networking Solutions Architect you will have a real impact on the business, while working closely with our customers, marketing and R&D teams. What you’ll be doing: • Work as customer technical specialist to address customer requirements and technical challenges during the pre-sales activities of the Spectrum-X solution. • Run and own proof of concept activities introducing our products and integrating them to new and existing accounts. • Support numerous levels of software running on NVIDIA's Ethernet Switches and BlueField Smart NIC. • Debug networking and performance issues and provide solutions to customers. • Work closely with our R&D teams to solve customer issues • Participate in building SW products roadmap by providing customer product requirements and feedback to engineering and marketing teams.

更新于 2025-06-12
logo of nvidia
社招

• Providing Ethernet and routing expertise to customers during project delivery to design, architect and test Ethernet networking solutions. • Work on multi-functional teams to provide Ethernet network expertise to server infrastructure builds, accelerated computing workloads and GPU enabled AI applications. • Crafting and evaluating DevOps automation scripts for network operations, crafting network architectures, and developing switch fabric configurations. • Implementing tasks related to network configuration and validation for data centers. • Create Methods of Procedure and deployment documents. • Use software tools to validate and monitor network performance.

更新于 2025-09-18
logo of nvidia
社招

NVIDIA data center systems, such as DGX and HGX, have become core to NVIDIA's rapidly growing enterprise and cloud provider businesses. These platforms bring together the full power of NVIDIA GPUs, NVIDIA NVLink, NVIDIA InfiniBand networking, NVIDIA Grace CPUs, and a fully optimized NVIDIA AI and HPC software stack. We are hiring Sr. Software Engineer who will help build simulators for our DGX Server platforms. Simulations play a significant role in building scalable systems at Speed of Light! You will work with world class engineering teams across HW and SW. What you’ll be doing: • Contribute to architect and develop simulation platform for next-gen NVIDIA DGX platforms. • Build, integrate and enhance simulator components with new HW features and write supporting technical documents. • Bring full SW stack up on DGX Simulator; work closely with hardware modeling, kernel & platform driver teams distributed globally. • Improve performance, fix bugs across user and kernel stack, and automate execution flow.

更新于 2025-09-22