logo of amd

AMDAI Framework Engineer

社招全职 Engineering地点:上海状态:招聘

任职要求


1. Expertise in Inference Frameworks: Proven, hands-on experience with vLLM or SGLang, including deep understanding of their source code, deployment, configuration, and performance tuning. (Please describe relevant projects in your resume). 2. Mastery of Model Architectures: In-depth understanding and practical experience with inference workflows of mainstream LLMs (e.g., DeepSeek, Qwen), including their tokenizers, model configurations, and architecture definitions. 3. Strong Theoretical Foundation: Solid grasp of the principles behind Transformer, Self-Attention, MoE, KV Cache, and their impact on inference performance. 4. Proven Optimization Experience: Familiarity with end-to-end LLM inference optimization techniques such as PagedAttention, FlashAttention, continuous/dynamic batching, and quantizatio…
登录查看完整任职要求
微信扫码,1秒登录

工作职责


Position Overview We are seeking a highly experienced engineer specializing in large language model (LLM) inference performance optimization. You will be a core member of our team, responsible for building and optimizing the LLM inference performance with high-throughput, low-latency on AMD Instinct GPUs. If you are passionate about pushing performance boundaries and have deep, hands-on expertise with cutting-edge technologies like vLLM or SGLang, we invite you to join us. Key Responsibilities 1. Core System Optimization: Lead the development, tuning, and customization of LLM performance optimization on AMD GPUs, leveraging and extending frameworks like vLLM or SGLang to address performance bottlenecks in production environments. 2. Performance Analysis & Tuning: Conduct end-to-end performance profiling using specialized tools. Perform deep optimization of compute-bound operators (e.g., Attention), memory I/O, and communication to significantly increase throughput and reduce latency. 3. Model Architecture Adaptation: Demonstrate expertise in mainstream LLM architectures (e.g., DeepSeek, Qwen, Llama, ChatGLM) and optimize inference for their specific characteristics (e.g., RoPE, SWA, MoE, GQA). 4. Algorithm & Principle Application: Leverage your deep understanding of core algorithms (Transformer, Attention, MoE) to implement advanced optimization techniques such as PagedAttention, FlashAttention, continuous batching, quantization, and model compression. 5. Technology Foresight & Implementation: Research and prototype state-of-the-art optimization techniques (e.g., Speculative Decoding, Weight-Only Quantization) and drive their adoption into production systems. Qualifications: Mandatory
包括英文材料
大模型+
vLLM+
SGLang+
Transformer+
缓存+
还有更多 •••
相关职位

logo of amd
社招 Enginee

THE ROLE:  As a core member of the team, you will play a pivotal role in optimizing and developing deep learning frameworks for AMD GPUs. Your experience will be critical in enhancing GPU kernels, deep learning models, and training/inference performance across multi-GPU and multi-node systems. You will engage with both internal GPU library teams and open-source maintainers to ensure seamless integration of optimizations, utilizing cutting-edge compiler technologies and advanced engineering principles to drive continuous improvement.

更新于 2025-10-25上海
logo of amd
社招 Enginee

THE ROLE: AMD is looking for a world class AI frameworks engineer who can provide technical leadership in the development of various AI frameworks in the AMD ecosystem. You will need to drive technical direction for next generation frameworks for AI model training and inference for wide variety of AMD devices, current and future, such as MI Instinct, and Radeon GPUs, XDNA devices, including the recently released Ryzen AI, Alveo V70 and Versal ACAP, and datacenter CPUs such as EPYC. You will work enhance the AI framework capabilities to enable cutting-edge models on onto AMD’s cutting-edge hardware.

更新于 2025-08-21北京
logo of nvidia
社招

NVIDIA is now looking for LLM Train Framework Engineers for the Megatron Core team. Megatron Core is open-source, scalable, and cloud-native frameworks built for researchers and developers working on Large Language Models (LLM) and Multimodal (MM) foundation model pretraining and post-training. Our GenAI Frameworks provide end-to-end model training, including pretraining, alignment, customization, evaluation, deployment, and tooling to optimize performance and user experience. Build on Megatron Core Framework's capabilities by inventing advanced distributed training algorithms and model optimizations. Collaborate with partners to implement optimized solutions. What you’ll be doing: • Build and develop open source Megatron Core. • Address extensive AI training and inference obstacles, covering the entire model lifecycle including orchestration, data pre-processing, conducting model training and tuning, and deploying models. • Work at the intersection of AI applications, libraries, frameworks, and the entire software stack. • Spearhead advancements in model architectures, distributed training strategies, and model parallel approaches. • Enhance the pace of foundation model training and optimization through mixed precision formulas and advanced NVIDIA GPU structures. • Performance tuning and optimizations of deep learning framework and software components. • Research, prototype, and develop robust and scalable AI tools and pipelines.

更新于 2025-10-13上海
logo of microsoft
社招Technolo

• Drive technical sales with decision makers using demos and PoCs to influence solution design and enable production deployments. • Lead hands-on engagements—hackathons, code-with sessions, and architecture workshops—to accelerate adoption of Microsoft’s developer tools and cloud platforms. • Build trusted relationships with developers and platform leads, co-designing secure, scalable architectures and solutions • Resolve technical blockers and objections, collaborating with engineering to share insights and improve products. • Maintain deep expertise in AI Foundry & App architecture (Agentic AI framework, Semantic Kernel, Foundry SDK, Responsible AI) and App architecture/cloud native dev (APIs, containerization, microservices, event-driven, Python, Java or .NET). • Maintain and grow expertise in AI Management & Security (Gen AI Ops, Sentinel, orchestrator, monitoring). • Represent Microsoft through thought leadership in developer communities and customer forums

更新于 2025-09-26深圳