SAPSAP China iXp Intern - Generative AI Developer - Shanghai
任职要求
We're looking for someone who takes initiative, perseveres, and stay curious. You like to work on meaningful innovative projects and are energized by lifelong learning. Currently pursuing a Full-time master’s degree in computer science or a related field; knowledge and experience in Generative AI, machine learning, or data analytics would be advantageous. Familiarity with mainstream programming languages, such as Python, TypeScript/JavaScript, or Java.…
工作职责
Engage with the latest BTP and Generative AI technologies, applying them to solve real-world business challenges Collaborate globally with cross-functional teams to design, prototype, and test innovative solutions within the BTP ecosystem. Participate in ideation sessions and contribute to planning of new technology implementations.
We are looking for a Generative AI Intern Engineer to join the NVIDIA Developer Technology group (Devtech) and work with a team of experienced engineers on innovative uses of AI for games and content creation. The Devtech team works with NVIDIA researchers and leading game developers to bring cutting edge AI research from across NVIDIA and the industry to gamers and 3D professionals in high performance packages such as real-time inferenced graphics, physics and animations. What you’ll be doing: • Research and implement innovative generative AI algorithms for game engines and authoring tools, including real-time neural graphics, physics based animation and diffusion models. • Develop neural graphics, animation and physics models and maintain open-source projects for both game-making and user runtimes. Integrate them into mainstream game engines and DCC tools. • Use various optimization techniques, such as tensor fusion and quantization, to fit the AI models onto user devices and maximize the performance of inference for real-time gaming. • Collaborate with game developers on optimizations and improvements for specific GenAI applications. • Interact closely with the architecture and driver teams at NVIDIA in ensuring the best possible experience on current generation hardware, and on determining trends and features for next generation architectures.
THE ROLE: We are seeking a talented Machine Learning Kernel Developer to design, develop, and optimize low-level machine learning kernels for AMD GPUs using the ROCm software stack. In this role, you will work on high-impact projects to accelerate AI frameworks and libraries, with a focus on emerging technologies like Large Language Models (LLMs) and other generative AI workloads. THE PERSON: The ideal candidate will have hands-on experience with GPU programming (ROCm or CUDA) and a passion for pushing the boundaries of AI performance. KEY RESPONSIBILITIES: Design and implement highly optimized ML kernels (e.g., matrix operations, attention mechanisms) for AMD GPUs using ROCm. Profile, debug, and tune kernel performance to maximize hardware utilization for AI workloads. Collaborate with ML researchers and framework developers to integrate kernels into AI frameworks (e.g., PyTorch, TensorFlow) and inference engines (e.g., vLLM, SGLang). Contribute to the ROCm software stack by identifying and resolving bottlenecks in libraries like MIOpen, BLAS, or Composable Kernel. Stay updated on the latest AI/ML trends (LLMs, quantization, distributed inference) and apply them to kernel development. Document and communicate technical designs, benchmarks, and best practices. Troubleshoot and resolve issues related to GPU compatibility, performance, and scalability. REQUIRED EXPERIENCE: 2+ years of experience in GPU kernel development for machine learning (ROCm or CUDA). Proficiency in C/C++ and Python, with experience in performance-critical programming. Strong understanding of ML frameworks (PyTorch, TensorFlow) and GPU-accelerated libraries. Basic knowledge of modern AI technologies (LLMs, transformers, inference optimization). Familiarity with parallel computing, memory optimization, and hardware architectures. Problem-solving skills and ability to work in a fast-paced environment.
THE ROLE: We are seeking a talented Machine Learning Kernel Developer to design, develop, and optimize low-level machine learning kernels for AMD GPUs using the ROCm software stack. In this role, you will work on high-impact projects to accelerate AI frameworks and libraries, with a focus on emerging technologies like Large Language Models (LLMs) and other generative AI workloads. THE PERSON: The ideal candidate will have hands-on experience with GPU programming (ROCm or CUDA) and a passion for pushing the boundaries of AI performance. KEY RESPONSIBILITIES: Design and implement highly optimized ML kernels (e.g., matrix operations, attention mechanisms) for AMD GPUs using ROCm. Profile, debug, and tune kernel performance to maximize hardware utilization for AI workloads. Collaborate with ML researchers and framework developers to integrate kernels into AI frameworks (e.g., PyTorch, TensorFlow) and inference engines (e.g., vLLM). Contribute to the ROCm software stack by identifying and resolving bottlenecks in libraries like MIOpen, HIP, or Composable Kernel. Stay updated on the latest AI/ML trends (LLMs, quantization, distributed inference) and apply them to kernel development. Document and communicate technical designs, benchmarks, and best practices. Troubleshoot and resolve issues related to GPU compatibility, performance, and scalability. REQUIRED EXPERIENCE: 2+ years of experience in GPU kernel development for machine learning (ROCm or CUDA). Proficiency in C/C++ and Python, with experience in performance-critical programming. Strong understanding of ML frameworks (PyTorch, TensorFlow) and GPU-accelerated libraries. Basic knowledge of modern AI technologies (LLMs, transformers, inference optimization). Familiarity with parallel computing, memory optimization, and hardware architectures. Problem-solving skills and ability to work in a fast-paced environment.