logo of amd

AMDGPU Kernel Development Engineer

社招全职 Engineering地点:上海状态:招聘

任职要求


Direct experience with AMD ROCm development (HIP, MIOpen, Composable Kernel). Knowledge of LLM-specific optimizations (e.g., FlashAttention, PagedAttention in vLLM). Experience with distributed training/inference or model compression techniques. Contributions to open-source ML projects or GPU compute libraries. ACADEMIC CREDENTIALS: Bachelor’s/Master’s in Computer Science, Electrical Engineering, or related field. #LI-FL1

工作职责


THE ROLE: We are seeking a talented Machine Learning Kernel Developer to design, develop, and optimize low-level machine learning kernels for AMD GPUs using the ROCm software stack. In this role, you will work on high-impact projects to accelerate AI frameworks and libraries, with a focus on emerging technologies like Large Language Models (LLMs) and other generative AI workloads.  THE PERSON: The ideal candidate will have hands-on experience with GPU programming (ROCm or CUDA) and a passion for pushing the boundaries of AI performance. KEY RESPONSIBILITIES: Design and implement highly optimized ML kernels (e.g., matrix operations, attention mechanisms) for AMD GPUs using ROCm. Profile, debug, and tune kernel performance to maximize hardware utilization for AI workloads. Collaborate with ML researchers and framework developers to integrate kernels into AI frameworks (e.g., PyTorch, TensorFlow) and inference engines (e.g., vLLM). Contribute to the ROCm software stack by identifying and resolving bottlenecks in libraries like MIOpen, HIP, or Composable Kernel. Stay updated on the latest AI/ML trends (LLMs, quantization, distributed inference) and apply them to kernel development. Document and communicate technical designs, benchmarks, and best practices. Troubleshoot and resolve issues related to GPU compatibility, performance, and scalability. REQUIRED EXPERIENCE: 2+ years of experience in GPU kernel development for machine learning (ROCm or CUDA). Proficiency in C/C++ and Python, with experience in performance-critical programming. Strong understanding of ML frameworks (PyTorch, TensorFlow) and GPU-accelerated libraries. Basic knowledge of modern AI technologies (LLMs, transformers, inference optimization). Familiarity with parallel computing, memory optimization, and hardware architectures. Problem-solving skills and ability to work in a fast-paced environment.
包括英文材料
内核+
大模型+
vLLM+
相关职位

logo of amd
社招 Enginee

THE ROLE: MTS Software development engineer on teams building and optimizing Deep Learning applications and AI frameworks for AMD GPU compute platforms.  Work as part of an AMD development team and open-source community to analyze, develop, test and deploy improvements to make AMD the best platform for machine learning applications. THE PERSON: Strong technical and analytical skills in C++ development in a Linux environment. Ability to work as part of a team, while also being able to work independently, define goals and scope and lead your own development effort. KEY RESPONSIBILITIES: Optimize Deep Learning Frameworks: In depth experience in enhance and optimize frameworks like TensorFlow and PyTorch for AMD GPUs in open-source repositories. Develop GPU Kernels: Create and optimize GPU kernels to maximize performance for specific AI operations. Develop & Optimize Models: Design and optimize deep learning models specifically for AMD GPU performance. Collaborate with GPU Library Teams: Work tightly with internal teams to analyze and improve training and inference performance on AMD GPUs. Collaborate with Open-Source Maintainers: Engage with framework maintainers to ensure code changes are aligned with requirements and integrated upstream. Work in Distributed Computing Environments: Optimize deep learning performance on both scale-up (multi-GPU) and scale-out (multi-node) systems. Utilize Cutting-Edge Compiler Tech: Leverage advanced compiler technologies to improve deep learning performance. Optimize Deep Learning Pipeline: Enhance the full pipeline, including integrating graph compilers. Software Engineering Best Practices: Apply sound engineering principles to ensure robust, maintainable solutions.

更新于 2025-09-17
logo of amd
社招 Enginee

THE ROLE: We are seeking a talented Machine Learning Kernel Developer to design, develop, and optimize low-level machine learning kernels for AMD GPUs using the ROCm software stack. In this role, you will work on high-impact projects to accelerate AI frameworks and libraries, with a focus on emerging technologies like Large Language Models (LLMs) and other generative AI workloads. THE PERSON: The ideal candidate will have hands-on experience with GPU programming (ROCm or CUDA) and a passion for pushing the boundaries of AI performance. KEY RESPONSIBILITIES: Design and implement highly optimized ML kernels (e.g., matrix operations, attention mechanisms) for AMD GPUs using ROCm. Profile, debug, and tune kernel performance to maximize hardware utilization for AI workloads. Collaborate with ML researchers and framework developers to integrate kernels into AI frameworks (e.g., PyTorch, TensorFlow) and inference engines (e.g., vLLM, SGLang). Contribute to the ROCm software stack by identifying and resolving bottlenecks in libraries like MIOpen, BLAS, or Composable Kernel. Stay updated on the latest AI/ML trends (LLMs, quantization, distributed inference) and apply them to kernel development. Document and communicate technical designs, benchmarks, and best practices. Troubleshoot and resolve issues related to GPU compatibility, performance, and scalability. REQUIRED EXPERIENCE: 2+ years of experience in GPU kernel development for machine learning (ROCm or CUDA). Proficiency in C/C++ and Python, with experience in performance-critical programming. Strong understanding of ML frameworks (PyTorch, TensorFlow) and GPU-accelerated libraries. Basic knowledge of modern AI technologies (LLMs, transformers, inference optimization). Familiarity with parallel computing, memory optimization, and hardware architectures. Problem-solving skills and ability to work in a fast-paced environment.

更新于 2025-09-17
logo of nvidia
社招

• Develop production-quality software that ships as part of NVIDIA's AI software stack, including optimized large language model (LLM) support. • Analyze the performance of important workloads, tuning our current software, and proposing improvements for future software. • Work with cross-collaborative teams of deep learning software engineers and GPU architects to innovate across applications like generative AI, autonomous driving, computer vision, and recommender systems. • Adapt to the constantly evolving AI industry by being agile and excited to contribute across the codebase, including API design, software architecture, performance modeling, testing, and GPU kernel development.

更新于 2025-09-18
logo of amd
社招 Enginee

THE ROLE: Triton is a language and compiler for writing highly efficient custom deep learning primitives. It's widely adopted in open AI software stack projects like PyTorch, vLLM, SGLang, and many others. AMD GPU is an official backend in Triton and we are fully committed to it. If you are interested in making GPUs running fast via developing the Triton compiler and kernels, please come join us!

更新于 2025-10-06