logo of nvidia

英伟达Senior Software Engineer, Enterprise AI Software

社招全职地点:上海状态:招聘

任职要求


• A history of using advanced programming skills to build distributed compute systems, backend services, microservices, and cloud technologies.
• Experience productionizing and deploying LLM models.
• Effective experience working with multi-functional teams, principals, and architects across organizational boundaries.
• Mentorship and the ability to grow teams and team members.
• Deep technical expertise in distributed containerized applications using Docker, Kubernetes, Helm Charts.
• Passion for building scalable and performant microservice applications.
• Excellent interpersonal skills and the flexibility to lead multi-functional efforts.
• Proven experience debugging and analyzing the performance of distributed microservices or cloud systems.
• A degree in Computer Science, Computer Engineering, or a relate…
登录查看完整任职要求
微信扫码,1秒登录

工作职责


• Design, build, and optimize containerized inference execution for LLM applications, ensuring efficiency and scalability. These applications may run in container orchestration platforms like Kubernetes to enable scalable and robust deployment.
• Ensure the performance and scalability of NIMs through comprehensive performance measurement and optimization.
• Apply container expertise to create and optimize the basic building blocks of NIMs, influencing the development of many models and related products within NVIDIA.
• Collaborate, brainstorm, and improve the designs of inference solutions and APIs with a broad team of software engineers, researchers, SREs, and product management.
• Mentor and collaborate with team members and other teams to foster growth and development. Demonstrate a history of learning and enhancing both personal skills and those of colleagues.
包括英文材料
大模型+
Docker+
Kubernetes+
还有更多 •••
相关职位

logo of nvidia
社招

• Design, build, and harden containers for NIM runtimes, inference backends; enable reproducible, multi-arch, CUDA-optimized builds. • Develop Python tooling and services for build orchestration, CI/CD integrations, Helm/Operator automation, and test harnesses; enforce quality with typing, linting, and unit/integration tests. • Help design and evolve Kubernetes deployment patterns for NIMs, including GPU scheduling, autoscaling, and multi-cluster rollouts. • Optimize container performance: layer layout, startup time, build caching, runtime memory/IO, network, and GPU utilization; instrument with metrics and tracing. • Evolve the base image strategy, dependency management, and artifact/registry topology. • Collaborate across research, backend, SRE, and product teams to ensure day-0 availability of new models. • Mentor teammates; set high engineering standards for container quality, security, and operability.

更新于 2025-09-15上海
logo of nvidia
社招

NVIDIA data center systems, such as DGX and HGX, have become core to NVIDIA's rapidly growing enterprise and cloud provider businesses. These platforms bring together the full power of NVIDIA GPUs, NVIDIA NVLink, NVIDIA InfiniBand networking, NVIDIA Grace CPUs, and a fully optimized NVIDIA AI and HPC software stack. We are hiring Sr. Software Engineer who will help build simulators for our DGX Server platforms. Simulations play a significant role in building scalable systems at Speed of Light! You will work with world class engineering teams across HW and SW. What you’ll be doing: • Contribute to architect and develop simulation platform for next-gen NVIDIA DGX platforms. • Build, integrate and enhance simulator components with new HW features and write supporting technical documents. • Bring full SW stack up on DGX Simulator; work closely with hardware modeling, kernel & platform driver teams distributed globally. • Improve performance, fix bugs across user and kernel stack, and automate execution flow.

更新于 2025-09-22上海|北京|深圳
logo of nvidia
社招

• Design, develop, and improve scalable infrastructure to support the next generation of AI applications, including copilots and agentic tools.  • Drive improvements in architecture, performance, and reliability, enabling teams to bring to bear LLMs and advanced agent frameworks at scale.  • Collaborate across hardware, software, and research teams, mentoring and supporting peers while encouraging best engineering practices and a culture of technical excellence.  • Stay informed of the latest advancements in AI infrastructure and contribute to continuous innovation across the organization.

更新于 2025-09-16上海
logo of nvidia
社招

We are looking for someone to be passionate about quality assurance. You’ll collaborate with multi-functional groups. Senior SWQA Test Development Engineers at NVIDIA aren't only manual testers, you write scripts to automate testing or build tools for QA team, so we can improve productivities or optimize test plan. We'd like to see your ability to identify weak spots and constantly craft better and creative test plans to break software and identify potential issues. You'll have a huge impact on the quality of NVIDIA's enterprise product. What you’ll be doing: • Utilizing AI-powered tools to enhance QA efficiency, including automating test case generation, defect detection, and regression testing. • Implementing AI-driven solutions to optimize test coverage and identify high-risk areas in software systems. • Collaborating with cross-functional teams to adopt AI tools that improve workflow automation and reduce manual effort. • Work closely with multi-functional teams to understand the test requirements and take ownership of product quality. • Develop test plans, design test cases, complete testing via automation and/or manually and compose test reports. • Build and maintain our complicated test environments. • Manage bug lifecycle and co-work with inter-groups to drive for solutions. • You will assist in the architecture, crafting and implementing of SWQA test frameworks. • Report bugs found during execution, assist with reproduction and debugs to understand root cause, verify bug fixes provided by R&D team, raise if not fixed. • Experience in using AI development tools. Adept at creating detailed test cases, automating them, increasing code coverage, identifying valid bugs early on, and solving these bugs swiftly.

更新于 2025-11-28上海