英伟达GPU Compiler LLVM Backend Intern - 2026
任职要求
NVIDIA's invention of the GPU 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI — the next era of computing — with the GPU acting as the brain of computers, robots, and self-driving cars that can perceive and understand the world. Today, we are increasingly known as “the AI computing company”.We are searching for a LLVM Compiler Intern for an exciting and fun role in our GPU Software organization. Our Compiler team is responsible for constructing and emitting the highest performance GPU machine instructions for Graphics (OpenGL, Vulkan, DX) and Compute (CUDA, PTX, OpenCL, Fortran, C++). This team is comprised of worldwide leading compiler engineering experts who provide leading edge performance and capabilities for NVIDIA's current and future complex parallel SIMT architectures. What You Will Be Doing: • Understand, modify, and improve an NVIDIA proprietary GPU compiler and assem…
工作职责
N/A
THE ROLE: Triton is a language and compiler for writing highly efficient custom deep learning primitives. It's widely adopted in open AI software stack projects like PyTorch, vLLM, SGLang, and many others. AMD GPU is an official backend in Triton and we are fully committed to it. If you are interested in making GPUs running fast via developing the Triton compiler and kernels, please come join us!
作为 AI 加速和 AI 编译器专家,负责开发和优化AI算法的编译器和工具链,设计和实施软硬件协同策略,推动AI技术在多个平台上的高效执行和应用。 1. 设计、开发和优化面向云上AI应用的Compiler Toolchain,提高算法在硬件上的性能和效率。 2. 研究和实现先进的软硬件协同设计方法,以优化AI模型的运行时性能和能效。 3. 与算法研发团队合作,确保AI模型的高效部署和执行。 4. 持续跟踪行业最新动态,评估和集成新的编程模型和软硬件技术。
THE ROLE: MTS Software development engineer on teams building and optimizing Deep Learning applications and AI frameworks for AMD GPU compute platforms. Work as part of an AMD development team and open-source community to analyze, develop, test and deploy improvements to make AMD the best platform for machine learning applications. THE PERSON: Strong technical and analytical skills in C++ development in a Linux environment. Ability to work as part of a team, while also being able to work independently, define goals and scope and lead your own development effort. KEY RESPONSIBILITIES: Optimize Deep Learning Frameworks: In depth experience in enhance and optimize frameworks like TensorFlow and PyTorch for AMD GPUs in open-source repositories. Develop GPU Kernels: Create and optimize GPU kernels to maximize performance for specific AI operations. Develop & Optimize Models: Design and optimize deep learning models specifically for AMD GPU performance. Collaborate with GPU Library Teams: Work tightly with internal teams to analyze and improve training and inference performance on AMD GPUs. Collaborate with Open-Source Maintainers: Engage with framework maintainers to ensure code changes are aligned with requirements and integrated upstream. Work in Distributed Computing Environments: Optimize deep learning performance on both scale-up (multi-GPU) and scale-out (multi-node) systems. Utilize Cutting-Edge Compiler Tech: Leverage advanced compiler technologies to improve deep learning performance. Optimize Deep Learning Pipeline: Enhance the full pipeline, including integrating graph compilers. Software Engineering Best Practices: Apply sound engineering principles to ensure robust, maintainable solutions.
• Design and implement the DSL and the core compiler of tile-aware GPU programming model for emerging GPU architectures • Continuously innovate and iterate on the core architecture of the compiler to consistently optimize performance • Investigation of next-generation GPU architectures and provide solutions in the DSL and compiler stack • Performance analysis on emerging AI/LLM workloads and integrate with AI/ML frameworks