英伟达Senior Solutions Architect, GPU System
任职要求
• BS/BA in Computer Science, Electrical/Computer Engineering, or equivalent experience, with 6+ years of experience with data center servers, GPU platforms, or large‑scale AI/HPC infrastructure. • Strong understanding of GPU server architecture: CPU/GPU balance, memory and PCIe/NVLink topology, storage and NIC placement, and power/cooling considerations. • Proven experience designing or operating AI or HPC clusters using GPU‑accelerated servers in cloud or on‑prem data centers. • Solid background in data center and cloud networking for AI workloads, including leaf‑spine fabrics, RDMA and high‑bandwidth/low‑latency designs. • Strong Linux system and Linux networking skills, including driver, firmware, and OS‑level tuning for GPU and NIC performance. • Knowledge and experience with K8S, RDMA/RoCE and, ideally, RoCE and Infiniband AI clusters. • Excellent communicati…
工作职责
• Lead presales and architecture engagements with AI industry customers, focusing on GPU servers, AI clusters, and large‑scale training/inference platforms built on NVIDIA HGX, GPU systems, and reference architectures. • Design and validate end‑to‑end AI data center solutions, including server platforms, storage connectivity, and high‑performance networking based on Spectrum, Quantum, ConnectX, and BlueField. • Define system architectures for AI supercomputing, LLM training, and inference workloads, including node configuration, GPU topology, PCIe/NVLink considerations, and network design. • Support business teams in exploring, developing, and deploying NVIDIA server and GPU solution opportunities, from early technical discovery through POC and production rollout. • Own and execute POCs and hands‑on labs that validate GPU server performance, scalability, reliability, and interoperability across compute, storage, and network domains. • Troubleshoot complex end‑to‑end issues involving GPU servers, firmware, drivers, operating systems, and networking stacks, and drive fixes with internal R&D and partners. • Provide structured feedback on platform features, system requirements, and customer needs to server OEMs, engineering, and product teams to improve NVIDIA AI platforms and ecosystems.
• Work with sales to introduce NVIDIA technologies and products. • Account owner to promote products to customers, and bring feedback to product team. • Private or public workshops to illustrate and output NVIDIA’s offerings in details. • Debugging, tuning, testing during qualification, POC, integration and pilot. • Build good relationship with all levels of customers and become a trusted advisor. • Discover opportunities and guide customers to suitable solution. • Share knowledge across teams.
• Design, implement, and optimize scalable ML training pipelines for training multimodal foundation models for robotics. • Collaborate with researchers to integrate cutting-edge model architectures into scalable training pipelines. • Implement scalable data loaders and preprocessors for multimodal datasets, such as videos, text, and sensor data. • Optimize GPU and cluster utilization for efficient model training and fine-tuning on massive datasets. • Develop robust monitoring and debugging tools to ensure the reliability and performance of training workflows on large GPU clusters.
We are now looking for a Senior Solutions Architect, an outstanding engineer able to engage with developers, researchers, and decision makers. We need individuals who can use AI to improve system efficiency and develop close relationships with our industry customers, making NVIDIA a great part of end-user solutions.NVIDIA is the world leader in GPU accelerated computing and is looking for Solution Architects to engage our customers. The Senior Solution Architect will work closely with the industry customers in our China region - establishing relationships, solving problems with their engineering teams, and helping them to build a successful NVIDIA practice. If interested, do not hesitate to apply online, we are exciting to talk with you! What you’ll be doing: • Presenting NVIDIA’s full stack Artificial Intelligence solutions, and end to end platform technology to customers and partners. With in-depth hands-on engagements with customers or NVIDIA partners on complicated Datacenter projects. • Assist field business development in guiding the customer through the sales process for NVIDIA solution. • Understand and analyze customer requirements, support the solution design and development of applications. • Team work across the company to guide the direction of accelerated computing, working with software, research, and product teams. • Document the learnings to guide others. This can vary from making targeted training for customers and other Solutions Architects, giving nice hands-on demos, writing whitepapers, blogs, and wiki articles, recording short videos, to simply working through hard problems with a partner on a whiteboard.
• Primary responsibilities will include building AI/HPC infrastructure for new and existing customers. • Support operational and reliability aspects of large-scale AI clusters, focusing on performance at scale, real-time monitoring, logging, and alerting. • Engage in and improve the whole lifecycle of services—from inception and design through deployment, operation, and refinement. • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health. • Provide feedback to internal teams such as opening bugs, documenting workarounds, and suggesting improvements.