AMDAI Architect (GPU)
任职要求
With experience to use LLMs or design an Agent to generate GPU kernels code (CUDA, CUTLASS, Triton, HIP) or other kernels code; With experience to use LLMs or design an Agent pipeline for some domain specific or vertical applications Hands-on experiences with AI tools (e.g. Pytroch, vLLM, Megatron-LM, Tensorflow, Dynamo, Deepspeed, TensorRT-LLM, TensorR…
工作职责
THE ROLE: “AI Product Applications Engineer (Solution Architect) – China” position is in the AMD AI group, located in China. THE PERSON: Success in this role will require deep knowledge of Data Center, Client, Endpoint AI workloads such as LLM, Generative AI, Recommendation, and/or transformer … AI cross cloud, client, edge… the candidate needs to have hands-on experiences with various AI models, end-to-end pipeline, industry framework (pytrouch, vLLM, SGLang, llm-d,Triton) / SDKs and solutions. KEY RESPONSIBILITIES: Position technical proposals / enablement to (blogs, tutorials, user guide…) AI SW developers and/or top customers. Provide significant contribution to AI SW developers / communities and/or customer PoC success. Drive AI developers / communities / customer requirements for AI SW, solution roadmap planning. Analyze competitive solutions to identify strength and weaknesses for articulating AMD AI SW & solution value propositions. Provide inputs / feedback to AI SW / hardware silicon / board roadmap for AI cross cloud, client, and edge...
THE ROLE: “AI Product Applications Engineer (Solution Architect) – China” position is in the AMD AI group, located in China.
• Capture business requirements, translate requirements into functional design, user stories, technical design, drive end to end integration testing, support data set up and issue remediation during UAT, manage development team activities, develop hypercare support model • Define and architect AI agents for Supply Chain use cases, using the right frameworks, multi-agent coordination, RAG, deployment, monitoring, and life cycle management. • Be hands on in quick proof of concepts development to demonstrate technical feasibility and implement enterprise grade Agentic Supply Chain solutions • Partner with Enterprise IT engineering, product, and research teams while evaluating LLMs, agentic frameworks, and NVIDIA’s own NeMo technologies. • Ensure integration with enterprise IT and Operations data sources and Industry’s best Agentic platforms with strong content security focus. • Drive architectural decisions across deployment models (on-prem, cloud, hybrid, containerized) to deliver scalable, reliable, and efficient solutions. • Lead design reviews, develop technical documentation, and guide developers in principles of architecture and code development. • Champion observability, monitoring, versioning, and telemetry to ensure trustworthy and auditable AI agents. • Influence Supply Chain Operations adoption of the platform by partnering with stakeholders across IT, supply chain and serve as a reference adopter providing feedback to strengthen NVIDIA’s ecosystem.
- As an AIML Specialist Solutions Architect (SA) in AI Infrastructure, you will serve as the Subject Matter Expert (SME) for providing optimal solutions in model training and inference workloads that leverage Amazon Web Services accelerator computing services. As part of the Specialist Solutions Architecture team, you will work closely with other Specialist SAs to enable large-scale customer model workloads and drive the adoption of AWS EC2, EKS, ECS, SageMaker and other computing platform for GenAI practice. - You will interact with other SAs in the field, providing guidance on their customer engagements, and you will develop white papers, blogs, reference implementations, and presentations to enable customers and partners to fully leverage AI Infrastructure on Amazon Web Services. You will also create field enablement materials for the broader SA population, to help them understand how to integrate Amazon Web Services GenAI solutions into customer architectures. - You must have deep technical experience working with technologies related to Large Language Model (LLM), Stable Diffusion and many other SOTA model architectures, from model designing, fine-tuning, distributed training to inference acceleration. A strong developing machine learning background is preferred, in addition to experience building application and architecture design. You will be familiar with the ecosystem of Nvidia and related technical options, and will leverage this knowledge to help Amazon Web Services customers in their selection process. - Candidates must have great communication skills and be very technical and hands-on, with the ability to impress Amazon Web Services customers at any level, from ML engineers to executives. Previous experience with Amazon Web Services is desired but not required, provided you have experience building large scale solutions. You will get the opportunity to work directly with senior engineers at customers, partners and Amazon Web Services service teams, influencing their roadmaps and driving innovations.
• Design, develop, and optimize major layers in LLM (e.g attention, GEMM, inter-GPU communication) for NVIDIA's new architectures. • Implement and fine-tune kernels to achieve optimal performance on NVIDIA GPUs. • Conduct in-depth performance analysis of GPU kernels, including Attention and other critical operations. • Identify bottlenecks, optimize resource utilization, and improve throughput, and power efficiency • Create and maintain workloads and micro-benchmark suites to evaluate kernel performance across various hardware and software configurations. • Generate performance projections, comparisons, and detailed analysis reports for internal and external stakeholders. • Collaborate with architecture, software, and product teams to guide the development of next-generation deep learning hardware and software.