AMDAI Compiler Software Development Engineer
任职要求
GPU Kernel Development & Optimization: Strong experienced in designing and optimizing GPU kernels for deep learning on AMD GPUs using HIP, CUDA, and assembly (ASM). Strong knowledge of AMD architectures (GCN, RDNA) and low-level programming to maximize performance for AI operations, leveraging tools like Compute Kernel (CK), CUTLASS, and Triton for multi-GPU and multi-platform performance. Deep Learning Integration: Strong experienced in integrating optimized GPU performance into machine learning frameworks (e.g., TensorFlow, PyTorch) to accelerate model training and inference, with a focus on scaling and throughput. Software Engineering: Strong skills in Python and C++, with experience in debugging, performance tuning, and test design to ensure high-quality, maintainable software solutions. High-Performance Computing: Strong experience in running large-scale workloads on heterogeneous compute clusters, optimizing for efficiency and scalability. Compiler Optimization: Expert understanding of compiler theory and tools like LLVM and ROCm for kernel and system performance optimization. ACADEMIC CREDENTIALS: Master’s degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field. 5 + years of professional experience in technical software development, with a focus on GPU optimization, performance engineering, and/or framework development. #LI-FL1
工作职责
THE ROLE: MTS Software development engineer on teams building and optimizing Deep Learning applications and AI frameworks for AMD GPU compute platforms. Work as part of an AMD development team and open-source community to analyze, develop, test and deploy improvements to make AMD the best platform for machine learning applications. THE PERSON: Strong technical and analytical skills in C++ development in a Linux environment. Ability to work as part of a team, while also being able to work independently, define goals and scope and lead your own development effort. KEY RESPONSIBILITIES: Optimize Deep Learning Frameworks: In depth experience in enhance and optimize frameworks like TensorFlow and PyTorch for AMD GPUs in open-source repositories. Develop GPU Kernels: Create and optimize GPU kernels to maximize performance for specific AI operations. Develop & Optimize Models: Design and optimize deep learning models specifically for AMD GPU performance. Collaborate with GPU Library Teams: Work tightly with internal teams to analyze and improve training and inference performance on AMD GPUs. Collaborate with Open-Source Maintainers: Engage with framework maintainers to ensure code changes are aligned with requirements and integrated upstream. Work in Distributed Computing Environments: Optimize deep learning performance on both scale-up (multi-GPU) and scale-out (multi-node) systems. Utilize Cutting-Edge Compiler Tech: Leverage advanced compiler technologies to improve deep learning performance. Optimize Deep Learning Pipeline: Enhance the full pipeline, including integrating graph compilers. Software Engineering Best Practices: Apply sound engineering principles to ensure robust, maintainable solutions.
THE ROLE: Triton is a language and compiler for writing highly efficient custom deep learning primitives. It's widely adopted in open AI software stack projects like PyTorch, vLLM, SGLang, and many others. AMD GPU is an official backend in Triton and we are fully committed to it. If you are interested in making GPUs running fast via developing the Triton compiler and kernels, please come join us!
An exciting internship opportunity to make an immediate contribution to AMD's next generation of technology innovations awaits you! We have a multifaceted, high-energy work environment filled with a diverse group of employees, and we provide outstanding opportunities for developing your career. During your internship, our programs provide the opportunity to collaborate with AMD leaders, receive one-on-one mentorship, attend amazing networking events, and much more. Being part of AMD means receiving hands-on experience that will give you a competitive edge. Together We Advance your career! JOB DETAILS: Location: Beijing,China Onsite/Hybrid: at least 3 days a week, either in a hybrid or onsite or remote work structure throughout the duration of the co-op/intern term. Duration: at least 6 months WHAT YOU WILL BE DOING: We are seeking highly motivated AI Compiler Software Engineering intern/co-op to join our team. In this role – We will involve you in extending Triton’s compiler infrastructure to support new AI workloads and hardware targets. We will assign you tasks to implement and optimize GPU kernels using Triton’s Python-based DSL. We will train you to analyze kernel performance using profiling tools and help you identify bottlenecks and optimization opportunities. We will understand how modern compilers translate high-level abstractions into efficient machine code.
A key part of NVIDIA's strength is our sophisticated analysis / debugging tools that empower NVIDIA engineers to improve perf and power efficiency of our products and the running applications. We are looking for forward-thinking, hard-working, and creative people to join a multifaceted software team with high standards! This software engineering role involves developing tools for AI researchers and SW/HW teams running AI workload in GPU cluster.As a member of the software development team, we will work with users from different departments like Architecture teams, Software teams. Our work brings the users intuitive, rich and accurate insight in the workload and the system, and empower them to find opportunities in software and hardware, build high level models to propose and deliver the best hardware and software to our customers, or debugging tricky failures and issues to help improve the performance and efficiency of the system. What you’ll be doing: • Build internal profiling and analysis tools for AI workloads at large scale • Build debugging tools for common encountered problems like memory or networking • Create benchmarking and simulation technologies for AI system or GPU cluster • Partner with HW architects to propose new features or improve existing features with real world use cases