logo of nvidia

英伟达Senior Solutions Architect, CSP System

社招全职地点:上海状态:招聘

任职要求


• Bachelor’s/Master’s/PhD degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field; equivalent industry experience is highly valued.
• 8+ years of hands-on experience in GPU architecture, AI system optimization, large-scale data center infrastructure, or hyperscale cloud computing, with solid experience in AI training/inference, distributed computing or HPC workloads.
• Deep understanding of GPU microarchitecture, CUDA programming model, GPU memory hierarchy and system scheduling mechanisms; proficient in performance profiling, bottleneck analysis and end-to-end AI workload tuning.
• Strong programming proficiency in C/C++ and Python; familiar with CUDA kernels, compiler toolchains, AI framework optimization (PyTorch/TensorRT) and large-scale distributed system tuning.
• Proven hands-on experience working with major Chinese CSPs or global hyperscalers, with in-depth knowledge of their public cloud AI service architectures, cluster operation mechanisms and core workload characteristics.
• Excellent technical communication and presentation skills, capable of explaining complex GPU system and AI infra technologies to technical engineers, architecture teams and business stakeholders.
• Strong cross-functional collaboration capability, able to work efficiently in a global matrix team and prioritize multiple high-value technical projects under fast-paced business demands.
• Familiar with NVIDIA full-stack products (GPU data center hardware, TensorRT-LLM, Dynamo, NCCL, CUDA software stack) is a signifi…
登录查看完整任职要求
微信扫码,1秒登录

工作职责


• Partner with Sales, BD and CPM teams to land NVIDIA GPU and AI Infra technologies into top-tier Chinese CSP accounts, drive technical penetration and sustainable business growth.
• Serve as the primary technical authority for NVIDIA GPU system and AI infrastructure solutions for Chinese CSPs, providing end-to-end consultation on GPU cluster architecture design, AI workload deployment, heterogeneous computing tuning, and full-stack software stack optimization.
• Unlock Vera CPU + GPU co-optimization value for RL training and Agentic AI workloads, eliminate CPU-GPU data movement bottlenecks, optimize end-to-end agent training and reasoning pipeline latency and throughput for CSP AI factory scenarios.
• Lead open-source system architecture contributions for NVIDIA AI infra stacks, upstream optimized patches for key open-source projects, build China-localized best practices and shape industry technical standards.
• Conduct in-depth GPU workload bottleneck analysis, implement system-level, kernel-level and framework-level tuning for AI training, inference, RL and gaming workloads, deliver production-ready reference designs and tuning guidelines for CSP mass deployment.
• Act as the key technical liaison between Chinese CSP customers and NVIDIA global engineering, product and R&D teams, collect high-value local workload requirements, drive product roadmap iteration, and ensure full compliance with NVIDIA global technical policies and export compliance rules.
• Lead technical workshops, hands-on training, PoC and production pilot projects for key CSP accounts, quantify and demonstrate GPU/AI Infra business value, accelerate technology adoption and large-scale replication.
• Monitor cutting-edge industry trends including Agentic AI, LLM inference optimization, cloud gaming AI, and next-gen data center system architectures, output strategic technical insights to support team and product strategy formulation.
• Mentor junior SA team members, standardize CSP technical engagement and solution delivery processes, and drive the precipitation of high-value technical best practices.
包括英文材料
HPC+
CUDA+
C+
Python+
开发框架+
还有更多 •••