logo of nvidia

英伟达AI Computing Performance Architect, Perf Analysis and Kernel Dev

社招全职地点:上海状态:招聘

任职要求


• MS or PhD in relevant discipline (CS, EE, Math)
• 3+ years of industry experience in GPU programming or performance optimization for DL applications.
• Demonstrated experience in analyzing and improving the performance of GPU kernels, with measurable results (e.g. performance improvements, efficiency gains).
• Strong programming skills in C, C++, Perl, or Python
• Strong background in computer architecture
• Excellent communication skills, both written and verbal.
• Strong organizational and time management abilities, with the ability to prioritize tasks effectively.

Ways to stand out from the crowd:
• LLM FMHA or GEMM related development or optimization experience will be a plus
• Expertise in CUDA programming for GPU acceleration will be a plus.
• Expertise in GPU/CPU Core or MemSys architecture modeling will be a plus.
#deeplearning

工作职责


• Design, develop, and optimize major layers in LLM (e.g attention, GEMM, inter-GPU communication) for NVIDIA's new architectures.
• Implement and fine-tune kernels to achieve optimal performance on NVIDIA GPUs.
• Conduct in-depth performance analysis of GPU kernels, including Attention and other critical operations.
• Identify bottlenecks, optimize resource utilization, and improve throughput, and power efficiency
• Create and maintain workloads and micro-benchmark suites to evaluate kernel performance across various hardware and software configurations.
• Generate performance projections, comparisons, and detailed analysis reports for internal and external stakeholders.
• Collaborate with architecture, software, and product teams to guide the development of next-generation deep learning hardware and software.
包括英文材料
C+
Perl+
Python+
大模型+
CUDA+
相关职位

logo of nvidia
社招

• Architect Performance Tooling: Develop infrastructure tools/libraries for GPU performance analysis, visualization, and automated workflows used across GPU SW/HW development life cycle.   • Unlock Architectural Insights: Analyze GPU workloads to identify bottlenecks and define new hardware profiling features that enhance perf debug and profiling capabilities.  • AI-Powered Automation: Build AI/ML-driven tools to automate performance analysis, generate perf optimization guidance, and improve user experience of profiling infrastructure.  • Cross-Stack Collaboration: Partner with kernel developers, system software teams, and hardware architects to support performance study, improve CUDA software stack, and co-design performance-centric solutions for current and next-generation GPU architecture

更新于 2025-09-18
logo of nvidia
社招

NVIDIA is leading company of AI computing. At NVIDIA, our employees are passionate about AI, HPC , VISUAL, GAMING. Our SA team is more focusing to bring NVIDIA new technology into difference industries. We help to design the architecture of AI computing platform, analysis the AI and HPC applications to deliver our value to customers.  You will work closely with industry sales, developer relationship managers and product teams in the hiring position. What you’ll be doing: • Assist in supporting industry accounts and driving research/influencing/new business in those accounts. • Assist researchers/engineers on their GPU applications. • Deliver technical projects, demos and client support tasks as directed by the Solution Architecture leadership team. • Provide technical support for GPU system deployments. • Be an industry thought leader on integrating NVIDIA technology into applications built on Deep Learning, High Performance Data Analytics, Robotics, Signal Processing and other key applications. • Be an internal champion for Data Analytics, Machine Learning, and Cyber among the NVIDIA technical community.

更新于 2025-06-17
logo of thead
社招7年以上技术-芯片

We are seeking a highly skilled and innovative AI Chip Architect who will play a key role in the development of cutting-edge AI hardware. The ideal candidate is a visionary and a problem-solver, capable of designing complex chip architectures optimized for performance, efficiency, and scalability. In this role, you will work with software and hardware engineering groups to define state-of-the-art AI chip architecture for high-performance computing system in Data Center. Key Responsibilities: * Define the architecture for the next-generation AI chips, including high-performance computing system, hierarchically memory/cache system, and high-speed interconnects. * Collaborate with a cross-functional team of hardware engineers, software developers, and machine learning specialists to ensure designs meet the performance and power requirements of AI applications. * Propose and evaluate architectural innovations to improve throughput, latency, energy efficiency, and scalability of AI processing. * Produce thorough documentation to articulate design decisions and architectural trade-offs to stakeholders. * Participate in design reviews, providing critical feedback and insights to improve chip quality and performance. * Oversee and contribute to the entire lifecycle of the chip design process, from specification to production and post-production support. * Mentor junior engineers and contribute to a culture of technical excellence.

更新于 2025-09-25
logo of nvidia
社招

NVIDIA networking designs and manufactures high-performance networking equipment that enable the most powerful super computers in the largest data centers in the world. With a distributed collection of NVIDIA GPUs inter-connected by networking solutions such as InfiniBand, Ethernet, or RoCE (RDMA over Converged Ethernet) we make powerful ML/AI platforms possible. We are seeking motivated, personable, and independent individuals to join our team!We seek experienced software embedded engineers to help support our groundbreaking, innovative technologies that make AI workloads in large clusters even more performant. As a networking Sr. Solutions Architect at NVIDIA you will have agency and palpable effects on the business, and work closely with customers and R&D teams. What you’ll be doing: • Support networking technologies such as Spectrum-X and work with customers on their technical challenges and requirements using said technologies during pre-sales activities • Develop proof-of-concept materials for innovative technologies for use by early adopters • Gain customers’ trust and understand their needs to help design and deploy groundbreaking NVIDIA networking platforms to run AI and HPC workloads • Address sophisticated and highly visible customer issues • Work closely with R&D teams to develop new features for customers • Help with product requirements alongside engineering and product marketing

更新于 2025-06-15