英伟达Senior Networking Hardware FAE
任职要求
• MS, or PhD. or equivalent. Strong academic background in Computer or Electrical Engineering, Computer Science, or related degree. • Computing and data communication system knowledge, familiar with server and switch systems. • 7+ years of work-related experience in hardware design and test, deep understanding on SI and PI, better having knowledge of thermal design. Hands-on experience on high speed IO debugging and tuning is MUST. • Familiar with hardware development and testing tools, familiar with 802.3 and PCIe protocols. • Ba…
工作职责
NVIDIA is the world leader in computer graphics, PC gaming, and accelerated computing. Today, we are tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of edge computers and robotics that can understand the world. Doing what is never been done before takes vision, innovation, and the world’s best talent. At NVIDIA, our employees are passionate about accelerated computing. We're united in our quest to transform the way accelerated computing are used for work and play. Our technology impacts the large language model in daily copilot, visual experience in video game development, film production, space exploration, medicine, computational finance and automotive design. And we've only scratched the surface of what we can accomplish when we apply our technology to it. We need passionate, hard-‐working and creative people to help us seek some of these outstanding opportunities.We are now looking for a hardware expert to join NVIDIA China FAE(Field Application Engineer) team, to engage and support NVIDIA networking product hardware solution. As a FAE, you'll collaborate with the sales team to support our customers, including Networking chips, components and hardware systems. You'll establish relationships with top customers, tackle engineering problems, and help customers to build a successful NVIDIA practice. What you’ll be doing: • Assist field business development in guiding the customer through the design-win process for NVIDIA data center solution. • Working with customers, understanding requirements, and leading the support from architecture, schematics, simulation and layout to production. • Review customers’ hardware solutions and design, support bring-up of customer designs, diagnose problems and seek to resolve technical issues. • Take an active role in assessing the technical details of customer projects. Build close technical relationship with customers & partners. • Collaborate across the company, work with NVIDIA worldwide hardware, software, application engineering, and product teams to lead technical activities and customer support. Guide the directions of NVIDIA product implementations.
• Contribute to design review and product features requirements under the whole Ethernet/ NIC/DPU/Switch portfolio. Design and build setup topologies with an emphasis on an emulation of customer large scale / complex environments. • Collaborating closely with multi-functional teams, including hardware engineers, software developers, and domain experts, to deliver optimized solutions that meet the demanding requirements of AI workloads. • Design, mentorship for testing automation team to implement tests. Generate comprehensive test reports during release execution procedure, assist with reproduction and debugs complex customer use cases, with determination of the issue root cause, be an engineering PIC for the full verification cycles of the customer use cases. • Complete end-to-end test scenarios in different scopes: Regression, Performance, Functional and Scale; Report the progress of testing and provide summary reports of testing activity. • Profiling, Benchmarking, and Analyzing Deep Learning models to identify areas for optimization and improvement in terms of performance, efficiency, and accuracy, with a strong emphasis on networking aspects. • Providing insights and recommendations based on the analysis of large-scale training results, specifically focusing on networking bottlenecks and optimizations, to improve model outcomes and achieve business objectives.
• Leading the design and development of advanced networking protocols for AI applications. • Collaborating closely with cross-functional teams to define hardware requirements and solutions. • Engaging with customers to understand their needs and successfully implement brand new technologies. • Driving innovation and ensuring the flawless execution of projects.
• Primary responsibilities will include building AI/HPC infrastructure for new and existing customers. • Support operational and reliability aspects of large-scale AI clusters, focusing on performance at scale, real-time monitoring, logging, and alerting. • Engage in and improve the whole lifecycle of services—from inception and design through deployment, operation, and refinement. • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health. • Provide feedback to internal teams such as opening bugs, documenting workarounds, and suggesting improvements.
• Design and prototype scalable software systems that optimize distributed AI training and inference—focusing on throughput, latency, and memory efficiency. • Develop and evaluate enhancements to communication libraries such as NCCL , UCX , and UCC , tailored to the unique demands of deep learning workloads. • Collaborate with AI framework teams (e.g., TensorFlow, PyTorch, JAX) to improve integration, performance, and reliability of communication backends. • Co-design hardware features (e.g., in GPUs, DPUs, or interconnects) that accelerate data movement and enable new capabilities for inference and model serving. • Contribute to the evolution of runtime systems, communication libraries, and AI-specific protocol layers. • Collaborate with customers to understand their needs and provide innovative solutions for them.