英伟达Computer Architecture Intern - LLM, 2026

实习兼职2025-10-20地点：上海状态：招聘

扫码手机上打开

任职要求

• Proven experience in software engineering, particularly in GPU programming and LLM inference.
• Strong proficiency in programming languages such as Python, C++, and CUDA.
• A solid understanding of deep learning frameworks and techniques.
• Outstanding problem-solving skills and the ability to work collaboratively in a tea…

登录查看完整任职要求

微信扫码，1秒登录

工作职责

• Develop and refine software solutions to expedite LLM SW stack (could be within inference/post train or pre-train phase) by harnessing the power of GPU technology.
• Collaborate closely with a world-class team of engineers to implement and refine GPU-based algorithms.
• Analyze and determine the most effective methods to improve performance, ensuring seamless execution across diverse computing environments.
• Engage in both individual and team projects, contributing to NVIDIA's mission of leading the AI revolution.
• Work in an empowering and inclusive environment to successfully implement groundbreaking AI solutions.

📮 投递简历 ✨AI模拟面试

难度：

包括英文材料

大模型+

Python+

C+

CUDA+

还有更多 •••

登录查看完整学习资料

相关职位

AI Computing Software Development Intern - 2026

实习

N/A

更新于 2026-01-08上海|北京

Deep Learning Performance Architect - Intern - 2026

实习

NVIDIA is developing processor and system architectures that accelerate deep learning and high-performance computing applications. We are looking for an intern deep learning system performance architect to join our AI performance modelling, analysis and optimization efforts. In this position, you will have a chance to work on DL performance modelling, analysis, and optimization on state-of-the-art hardware architectures for various LLM workloads. You will make your contributions to our dynamic technology focused company. What you’ll be doing: • Analyze state of the art DL networks (LLM etc.), identify and prototype performance opportunities to influence SW and Architecture team for NVIDIA's current and next gen inference products. • Develop analytical models for the state of the art deep learning networks and algorithm to innovate processor and system architectures design for performance and efficiency. • Specify hardware/software configurations and metrics to analyze performance, power, and accuracy in existing and future uni-processor and multiprocessor configurations. • Collaborate across the company to guide the direction of next-gen deep learning HW/SW by working with architecture, software, and product teams.

更新于 2025-11-10上海

Developer Technology Engineering Intern - 2026

实习

• Working directly with key application developers (especially LLM) to understand the current and future problems they are solving, creating and optimizing core parallel algorithms and data structures to provide the best solutions using GPUs, through both library development and direct contribution to the applications. This includes training and inference optimization for large language models, directly contributing to frameworks such as Megatron and TRTLLM, SGLang, vLLM... • Collaborating closely with the architecture, research, libraries, tools, and system software teams at NVIDIA to influence the design of next-generation architectures, software platforms, and programming models, including by investigating impact on application performance and developer productivity. • Engaging in deep optimization of high-performance operators, involving but not limited to CUDA deep optimization, instruction and compiler optimization. These optimizations will directly support customers or be integrated into products like cuDNN, cuBLAS, and CUTLASS...

更新于 2025-12-31北京|上海

AI Computing Performance Architect Intern, Perf Analysis and Kernel Dev - 2026

实习

• Design, develop, and optimize major layers in LLM (e.g attention, GEMM, inter-GPU communication) for NVIDIA's new architectures. • Implement and fine-tune kernels to achieve optimal performance on NVIDIA GPUs. • Conduct in-depth performance analysis of GPU kernels, including Attention and other critical operations. • Identify bottlenecks, optimize resource utilization, and improve throughput, and power efficiency • Create and maintain workloads and micro-benchmark suites to evaluate kernel performance across various hardware and software configurations. • Generate performance projections, comparisons, and detailed analysis reports for internal and external stakeholders. • Collaborate with architecture, software, and product teams to guide the development of next-generation deep learning hardware and software.

更新于 2026-01-20上海|北京