英伟达Senior Networking Hardware FAE
任职要求
• MS, or PhD. or equivalent. Strong academic background in Computer or Electrical Engineering, Computer Science, or related degree. • Computing and data communication system knowledge, familiar with server and switch systems. • 7+ years of work-related experience in hardware design and test, deep understanding on SI and PI, better having knowledge of thermal design. Hands-on experience on high speed IO debugging and tuning is MUST. • Familiar with hardware development and testing tools, familiar with 802.3 and PCIe protocols. • Ba…
工作职责
NVIDIA is the world leader in computer graphics, PC gaming, and accelerated computing. Today, we are tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of edge computers and robotics that can understand the world. Doing what is never been done before takes vision, innovation, and the world’s best talent. At NVIDIA, our employees are passionate about accelerated computing. We're united in our quest to transform the way accelerated computing are used for work and play. Our technology impacts the large language model in daily copilot, visual experience in video game development, film production, space exploration, medicine, computational finance and automotive design. And we've only scratched the surface of what we can accomplish when we apply our technology to it. We need passionate, hard-‐working and creative people to help us seek some of these outstanding opportunities.We are now looking for a hardware expert to join NVIDIA China FAE(Field Application Engineer) team, to engage and support NVIDIA networking product hardware solution. As a FAE, you'll collaborate with the sales team to support our customers, including Networking chips, components and hardware systems. You'll establish relationships with top customers, tackle engineering problems, and help customers to build a successful NVIDIA practice. What you’ll be doing: • Assist field business development in guiding the customer through the design-win process for NVIDIA data center solution. • Working with customers, understanding requirements, and leading the support from architecture, schematics, simulation and layout to production. • Review customers’ hardware solutions and design, support bring-up of customer designs, diagnose problems and seek to resolve technical issues. • Take an active role in assessing the technical details of customer projects. Build close technical relationship with customers & partners. • Collaborate across the company, work with NVIDIA worldwide hardware, software, application engineering, and product teams to lead technical activities and customer support. Guide the directions of NVIDIA product implementations.
• Contribute to design review and product features requirements under the whole Ethernet/ NIC/DPU/Switch portfolio. Design and build setup topologies with an emphasis on an emulation of customer large scale / complex environments. • Collaborating closely with multi-functional teams, including hardware engineers, software developers, and domain experts, to deliver optimized solutions that meet the demanding requirements of AI workloads. • Design, mentorship for testing automation team to implement tests. Generate comprehensive test reports during release execution procedure, assist with reproduction and debugs complex customer use cases, with determination of the issue root cause, be an engineering PIC for the full verification cycles of the customer use cases. • Complete end-to-end test scenarios in different scopes: Regression, Performance, Functional and Scale; Report the progress of testing and provide summary reports of testing activity. • Profiling, Benchmarking, and Analyzing Deep Learning models to identify areas for optimization and improvement in terms of performance, efficiency, and accuracy, with a strong emphasis on networking aspects. • Providing insights and recommendations based on the analysis of large-scale training results, specifically focusing on networking bottlenecks and optimizations, to improve model outcomes and achieve business objectives.
• Leading the design and development of advanced networking protocols for AI applications. • Collaborating closely with cross-functional teams to define hardware requirements and solutions. • Engaging with customers to understand their needs and successfully implement brand new technologies. • Driving innovation and ensuring the flawless execution of projects.
• Define and manage program schedules, deliverables, and turning points for networking and switch box programs, aligned with CSP customer roadmaps and their ODM/OEM partners. • Translate CSP and ODM/OEM customer requirements into actionable tasks for all functional teams, driving timely issue resolution across engineering, quality, logistics, and sales. • Provide horizontal leadership to drive cross-functional program execution, managing high priority issues/concerns, and coordinating globally dispersed teams to ensure clear alignment on objectives. • Act as the primary customer and partner interface, facilitating program kick-offs, technical discussions, design reviews, issues/bug tracking and resolution, customer qualification/validation efforts, and status updates throughout the product lifecycle. • Drive program execution from design through production deployment, ensuring on-time delivery, quality, and customer acceptance. • Provide ongoing post-deployment sustaining support, serving as case manager for server/networking system building. • Monitor factory production schedules, yields, and and blockers, collaborating closely with CSP ODM/OEM partners to meet committed metrics, schedules, and operational targets. • Proactively communicate program health, risks, dependencies, and key insights to customers, internal partners, and senior leadership, ensuring transparency and strong cross-functional alignment.
• Primary responsibilities will include building AI/HPC infrastructure for new and existing customers. • Support operational and reliability aspects of large-scale AI clusters, focusing on performance at scale, real-time monitoring, logging, and alerting. • Engage in and improve the whole lifecycle of services—from inception and design through deployment, operation, and refinement. • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health. • Provide feedback to internal teams such as opening bugs, documenting workarounds, and suggesting improvements.