logo of nvidia

英伟达Senior AI Performance and Efficiency Engineer

社招全职地点:上海状态:招聘

任职要求


• BS or similar background in Computer Science or related area (or equivalent experience) 
• Minimum 8+ years of experience designing and operating large scale compute infrastructure
• Strong understanding of modern ML techniques and tools 
• Experience investigating, and resolving, training & inference performance end to end
• Debugging and optimization experience with NSight Systems and NSight Compute
• Experience with debugging large-scale distributed training using NCCL
• Proficiency in programming & scripting languages such as Python, Go, Bash, as well as familiarity with cloud computing platforms (e.g., AWS, GCP, Azure) in addition to experience with parallel computing frameworks and paradigms.
• Dedication to ongoing learning and staying updated on new technologies and innovative methods in the AI/ML in…
登录查看完整任职要求
微信扫码,1秒登录

工作职责


• Collaborate closely with our AI/ML researchers to make their ML models more efficient leading to significant productivity improvements and cost savings
• Build tools, frameworks, and apply ML techniques to detect & analyze efficiency bottlenecks and deliver productivity improvements for our researchers
• Work with researchers working on a variety of innovative ML workloads across Robotics, Autonomous vehicles, LLM’s, Videos and more
• Collaborate across the engineering organizations to deliver efficiency in our usage of hardware, software, and infrastructure 
• Proactively monitor fleet wide utilization patterns, analyze existing inefficiency patterns, or discover new patterns, and deliver scalable solutions to solve them
• Keep up to date with the most recent developments in AI/ML technologies, frameworks, and successful strategies, and advocate for their integration within the organization.
包括英文材料
Nsight+
NCCL+
Python+
Go+
Bash+
还有更多 •••
相关职位

logo of nvidia
社招

A key part of NVIDIA's strength is our sophisticated analysis / debugging tools that empower NVIDIA engineers to improve perf and power efficiency of our products and the running applications. We are looking for forward-thinking, hard-working, and creative people to join a multifaceted software team with high standards! This software engineering role involves developing tools for AI researchers and SW/HW teams running AI workload in GPU cluster.As a member of the software development team, we will work with users from different departments like Architecture teams, Software teams. Our work brings the users intuitive, rich and accurate insight in the workload and the system, and empower them to find opportunities in software and hardware, build high level models to propose and deliver the best hardware and software to our customers, or debugging tricky failures and issues to help improve the performance and efficiency of the system. What you’ll be doing: • Build internal profiling and analysis tools for AI workloads at large scale • Build debugging tools for common encountered problems like memory or networking • Create benchmarking and simulation technologies for AI system or GPU cluster • Partner with HW architects to propose new features or improve existing features with real world use cases

更新于 2025-06-19上海
logo of amd
社招 Enginee

THE ROLE: AMD drives innovation at the intersection of performance and efficiency to shape the future of AI, cloud computing, and high-performance servers. We seek an experienced IC Connector Engineer to lead the design and development of high-speed connectors, cables, and sockets for advanced servers and AI platforms. This role requires deep technical expertise and program leadership to deliver reliable, cost-effective solutions at scale.

更新于 2025-12-02上海
logo of nvidia
社招

As chip sizes continue to grow, power efficiency has become paramount across all applications - from data centers to automotive and personal computing. Our PMU IP, developed over the past 13 years, is crucial in optimizing chip performance and efficiency in both idle and active scenarios. The PMU IP consists of a RISC-V core and custom-designed control logic. It collects and processes data from the entire chip, working in tandem with software running on the RISC-V core to determine optimal operating points. We are seeking a Senior ASIC Engineer who can help architect the next generation PMU for AI datacenter. What you’ll be doing: • Collaborate with the production SW team and power arch team to define the architecture/micro-architecture for various power features. • Learn how PMU's function impacts the system and support the silicon debug. • Implement the micro-architecture to RTL design.

更新于 2025-07-03上海
logo of nvidia
社招

• Contribute to design review and product features requirements under the whole Ethernet/ NIC/DPU/Switch portfolio. Design and build setup topologies with an emphasis on an emulation of customer large scale / complex environments. • Collaborating closely with multi-functional teams, including hardware engineers, software developers, and domain experts, to deliver optimized solutions that meet the demanding requirements of AI workloads. • Design, mentorship for testing automation team to implement tests. Generate comprehensive test reports during release execution procedure, assist with reproduction and debugs complex customer use cases, with determination of the issue root cause, be an engineering PIC for the full verification cycles of the customer use cases. • Complete end-to-end test scenarios in different scopes: Regression, Performance, Functional and Scale; Report the progress of testing and provide summary reports of testing activity. • Profiling, Benchmarking, and Analyzing Deep Learning models to identify areas for optimization and improvement in terms of performance, efficiency, and accuracy, with a strong emphasis on networking aspects. • Providing insights and recommendations based on the analysis of large-scale training results, specifically focusing on networking bottlenecks and optimizations, to improve model outcomes and achieve business objectives.

更新于 2025-12-01上海|北京