英伟达Senior Solutions Architect, CSP System
社招全职地点:上海状态:招聘
任职要求
• Bachelor’s/Master’s/PhD degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field; equivalent industry experience is highly valued. • 8+ years of hands-on experience in GPU architecture, AI system optimization, large-scale data center infrastructure, or hyperscale cloud computing, with solid experience in AI training/inference, distributed computing or HPC workloads. • Deep understanding of GPU microarchitecture, CUDA programming model, GPU memory hierarchy and system scheduling mechanisms; proficient in performance profiling, bottleneck analysis and end-to-end AI workload tuning. • Strong programming proficiency in C/C++ and Python; familiar with CUDA kernels, compiler toolchains, AI framework optimization (PyTorch/TensorRT) and large-scale distributed system tuning. • Proven hands-on experience working with major Chinese CSPs or global hyperscalers, with in-depth knowledge of their public cloud AI service architectures, cluster operation mechanisms and core workload characteristics. • Excellent technical communication and presentation skills, capable of explaining complex GPU system and AI infra technologies to technical engineers, architecture teams and business stakeholders. • Strong cross-functional collaboration capability, able to work efficiently in a global matrix team and prioritize multiple high-value technical projects under fast-paced business demands. • Familiar with NVIDIA full-stack products (GPU data center hardware, TensorRT-LLM, Dynamo, NCCL, CUDA software stack) is a signifi…
登录查看完整任职要求
微信扫码,1秒登录
工作职责
• Partner with Sales, BD and CPM teams to land NVIDIA GPU and AI Infra technologies into top-tier Chinese CSP accounts, drive technical penetration and sustainable business growth. • Serve as the primary technical authority for NVIDIA GPU system and AI infrastructure solutions for Chinese CSPs, providing end-to-end consultation on GPU cluster architecture design, AI workload deployment, heterogeneous computing tuning, and full-stack software stack optimization. • Unlock Vera CPU + GPU co-optimization value for RL training and Agentic AI workloads, eliminate CPU-GPU data movement bottlenecks, optimize end-to-end agent training and reasoning pipeline latency and throughput for CSP AI factory scenarios. • Lead open-source system architecture contributions for NVIDIA AI infra stacks, upstream optimized patches for key open-source projects, build China-localized best practices and shape industry technical standards. • Conduct in-depth GPU workload bottleneck analysis, implement system-level, kernel-level and framework-level tuning for AI training, inference, RL and gaming workloads, deliver production-ready reference designs and tuning guidelines for CSP mass deployment. • Act as the key technical liaison between Chinese CSP customers and NVIDIA global engineering, product and R&D teams, collect high-value local workload requirements, drive product roadmap iteration, and ensure full compliance with NVIDIA global technical policies and export compliance rules. • Lead technical workshops, hands-on training, PoC and production pilot projects for key CSP accounts, quantify and demonstrate GPU/AI Infra business value, accelerate technology adoption and large-scale replication. • Monitor cutting-edge industry trends including Agentic AI, LLM inference optimization, cloud gaming AI, and next-gen data center system architectures, output strategic technical insights to support team and product strategy formulation. • Mentor junior SA team members, standardize CSP technical engagement and solution delivery processes, and drive the precipitation of high-value technical best practices.
包括英文材料
HPC+
https://www.ibm.com/think/topics/hpc
HPC is a technology that uses clusters of powerful processors that work in parallel to process massive, multidimensional data sets and solve complex problems at extremely high speeds.
CUDA+
https://developer.nvidia.com/blog/even-easier-introduction-cuda/
This post is a super simple introduction to CUDA, the popular parallel computing platform and programming model from NVIDIA.
https://www.youtube.com/watch?v=86FAWCzIe_4
Lean how to program with Nvidia CUDA and leverage GPUs for high-performance computing and deep learning.
C+
https://www.freecodecamp.org/chinese/news/the-c-beginners-handbook/
本手册遵循二八定律。你将在 20% 的时间内学习 80% 的 C 编程语言。
https://www.youtube.com/watch?v=87SH2Cn0s9A
https://www.youtube.com/watch?v=KJgsSFOSQv0
This course will give you a full introduction into all of the core concepts in the C programming language.
https://www.youtube.com/watch?v=PaPN51Mm5qQ
In this complete C programming course, Dr. Charles Severance (aka Dr. Chuck) will help you understand computer architecture and low-level programming with the help of the classic C Programming language book written by Brian Kernighan and Dennis Ritchie.
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
开发框架+
[英文] Understanding Modern Development Frameworks: A Guide for Developers and Technical Decision-makers
https://www.freecodecamp.org/news/understanding-modern-development-frameworks-guide-for-devs/
还有更多 •••