英伟达Senior Solutions Architect - KV Cache and AI Storage

社招全职2026-04-07地点：北京状态：招聘

扫码手机上打开

任职要求

• Bachelor's degree or higher in Computer Science or a related field with strong systems or storage background.
• 5+ years of relevant experience, including 2+ years passionate about KV stores/caches or storage backends.
• Hands‑on experience with distributed storage, caching, or large‑scale backend systems.
• Solid understanding of Transformer / LLM inference and KV cache concepts, plus experience with at least one LLM serving stack (for example vLLM, TensorRT‑LLM or SGLang).
• Strong knowledge of NVMe SSDs, KV SSDs, and modern storage servers, including controller/firmware behavior and I/O characteristics.
• Practical experience with tiered memory and KV cache optimizations such as offloading (HBM → DRAM → NVMe), eviction/selection strategies, compression/quantization, or attention‑level optimizations.
• Familiarity with at least one large‑scale storage or caching system (such as Ceph, Redis, Cassandra, RocksDB‑based KV, object storage, or distributed logs).

Ways to stand out from the crowd:
…

登录查看完整任职要求

微信扫码，1秒登录

工作职责

• Lead technical exploration with customer architects to understand models, frameworks, SLOs, and KV cache usage patterns.
• Build end-to-end KV cache solutions using tiered memory and NVIDIA modern networking technologies.
• Analyze performance profiles, identify bottlenecks, and drive PoCs and benchmarks to validate improvements.
• Translate customer difficulties into clear feature requests and roadmap input for NVIDIA products.
• Build reference architectures, best-practice guides, and deliver tech talks to support our field teams and customers.

📮 投递简历 ✨AI模拟面试

难度：

包括英文材料

Transformer+

大模型+

缓存+

vLLM+

TensorRT+

还有更多 •••

登录查看完整学习资料

相关职位

Senior Deep Learning Solution Architect

社招

N/A

更新于 2026-04-04北京|上海

Software Architect, Enterprise AI Software

社招

• Define the end-to-end technical architecture for the NIM Factory, from container build systems and CI/CD to Kubernetes deployment patterns and runtime optimization. • Drive technical strategy and roadmap, making high-impact decisions on frameworks, technologies, and standards that empower dozens of engineering teams. • Architect and influence the design of workflow orchestration systems that underpin the NIM factory. • Coach and mentor senior engineers across the organization, fostering a culture of technical excellence, innovation, and knowledge sharing. • Champion best practices in software development, including API design, automation, observability, and secure supply chain management. • Collaborate with leadership across research, backend, SRE, and product to align technical vision with product goals and influence technical roadmaps.

更新于 2025-09-18上海

Senior Solutions Architect, InfiniBand and Networking Ethernet - NVIS

社招

• Primary responsibilities will include building AI/HPC infrastructure for new and existing customers. • Support operational and reliability aspects of large-scale AI clusters, focusing on performance at scale, real-time monitoring, logging, and alerting. • Engage in and improve the whole lifecycle of services—from inception and design through deployment, operation, and refinement. • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health. • Provide feedback to internal teams such as opening bugs, documenting workarounds, and suggesting improvements.

更新于 2025-09-29北京

Senior Solutions Architect, Foundation Model

社招

• Design, implement, and optimize scalable ML training pipelines for training multimodal foundation models for robotics. • Collaborate with researchers to integrate cutting-edge model architectures into scalable training pipelines. • Implement scalable data loaders and preprocessors for multimodal datasets, such as videos, text, and sensor data. • Optimize GPU and cluster utilization for efficient model training and fine-tuning on massive datasets. • Develop robust monitoring and debugging tools to ensure the reliability and performance of training workflows on large GPU clusters.

更新于 2025-08-21上海