AMDAI Model Training Development Engineer
任职要求
Responsibilities Develop and optimize core training operators on AMD GPUs (GEMM, GroupedGEMM, Attention, DeepEP, etc.), continuously pursuing state-of-the-art performance. Conduct in-depth analysis of performance bottlenecks in large-scale model training and drive targeted end-to-end performance optimizations. Collaborate closely with AMD’s software and hardware teams to enhance the performance and stability of the ROCm ecosystem. Participate in cutting-edge technology research, including but not limited to next-generation GPU hardware, compute-communication operator fusion, and AGI-driven automatic generation of high-performance operators. Qualifications Solid foundation in computer architecture and high-performance computing. Proficient in C/C++, familiar with GPU programming (HIP / CUDA) and parallel development languages such as Triton, with strong engineering implementation skills. Familiar with parallel computing principles and GPU execution …
工作职责
N/A
THE ROLE: We are looking for Machine Learning Engineer to join our Models and Applications team. If the challenge of distributed training of large model on large number of GPUs excites you and you are passionate about improving training efficiency and enjoy innovating and coming up with new ideas, then this role is for you. You will be part of world class team focus on addressing the challenge of training generative AI.
THE ROLE: AMD is looking for a world class AI frameworks engineer who can provide technical leadership in the development of various AI frameworks in the AMD ecosystem. You will need to drive technical direction for next generation frameworks for AI model training and inference for wide variety of AMD devices, current and future, such as MI Instinct, and Radeon GPUs, XDNA devices, including the recently released Ryzen AI, Alveo V70 and Versal ACAP, and datacenter CPUs such as EPYC. You will work enhance the AI framework capabilities to enable cutting-edge models on onto AMD’s cutting-edge hardware.
THE ROLE: AMD is looking for a senior software engineer to join our growing team. As a key contributor, you will be part of a leading team to drive and enhance AMD’s abilities to deliver the highest quality, industry-leading technologies to market.
THE ROLE: MTS Software development engineer on teams building and optimizing Deep Learning applications and AI frameworks for AMD GPU compute platforms. Work as part of an AMD development team and open-source community to analyze, develop, test and deploy improvements to make AMD the best platform for machine learning applications. THE PERSON: Strong technical and analytical skills in C++ development in a Linux environment. Ability to work as part of a team, while also being able to work independently, define goals and scope and lead your own development effort. KEY RESPONSIBILITIES: Optimize Deep Learning Frameworks: In depth experience in enhance and optimize frameworks like TensorFlow and PyTorch for AMD GPUs in open-source repositories. Develop GPU Kernels: Create and optimize GPU kernels to maximize performance for specific AI operations. Develop & Optimize Models: Design and optimize deep learning models specifically for AMD GPU performance. Collaborate with GPU Library Teams: Work tightly with internal teams to analyze and improve training and inference performance on AMD GPUs. Collaborate with Open-Source Maintainers: Engage with framework maintainers to ensure code changes are aligned with requirements and integrated upstream. Work in Distributed Computing Environments: Optimize deep learning performance on both scale-up (multi-GPU) and scale-out (multi-node) systems. Utilize Cutting-Edge Compiler Tech: Leverage advanced compiler technologies to improve deep learning performance. Optimize Deep Learning Pipeline: Enhance the full pipeline, including integrating graph compilers. Software Engineering Best Practices: Apply sound engineering principles to ensure robust, maintainable solutions.