logo of nvidia

英伟达Senior Performance Software Engineer, Deep Learning Libraries

社招全职地点:上海 | 北京状态:招聘

任职要求


• Masters or PhD degree or equivalent experience in Computer Science, Computer Engineering, Applied Math, or related field
• 2+ years of relevant industry experience
• Demonstrated strong C++ programming and software design skills, including debugging, performance analysis, and test design
• Experience with performance-oriented parallel programming, even if it’s not on GPUs (e.g. with OpenMP or pthreads)
• Solid understanding of computer architecture and some experience with assembly programming
• Identify bottlenecks, optimize resource utilization, and improve throughput.

Ways to stand out from the crowd:
• Tuning BLAS or deep learning library kernel code
• CUDA GPU programming
• Numerical methods and linear algebra
• LLVM, TVM tensor expressions, or TensorFlow MLIR

工作职责


• Writing highly tuned compute kernels to perform core deep learning operations (e.g. matrix multiplies, convolutions, normalizations)
• Following general software engineering best practices including support for regression testing and CI/CD flows
• Collaborating with teams across NVIDIA:• CUDA compiler team on generating optimal assembly code
• Deep learning training and inference performance teams on which layers require optimization
• Hardware and architecture teams on the programming model for new deep learning hardware features
包括英文材料
C+
Assembly+
内核+
CUDA+
LLVM+
TensorFlow+
相关职位

logo of nvidia
社招

• Contribute to design review and product features requirements under the whole Ethernet/ NIC/DPU/Switch portfolio. Design and build setup topologies with an emphasis on an emulation of customer large scale / complex environments. • Collaborating closely with multi-functional teams, including hardware engineers, software developers, and domain experts, to deliver optimized solutions that meet the demanding requirements of AI workloads. • Design, mentorship for testing automation team to implement tests. Generate comprehensive test reports during release execution procedure, assist with reproduction and debugs complex customer use cases, with determination of the issue root cause, be an engineering PIC for the full verification cycles of the customer use cases. • Complete end-to-end test scenarios in different scopes: Regression, Performance, Functional and Scale; Report the progress of testing and provide summary reports of testing activity. • Profiling, Benchmarking, and Analyzing Deep Learning models to identify areas for optimization and improvement in terms of performance, efficiency, and accuracy, with a strong emphasis on networking aspects. • Providing insights and recommendations based on the analysis of large-scale training results, specifically focusing on networking bottlenecks and optimizations, to improve model outcomes and achieve business objectives.

更新于 2025-05-21
logo of nvidia
社招

NVIDIA is now looking for LLM Train Framework Engineers for the Megatron Core team. Megatron Core is open-source, scalable, and cloud-native frameworks built for researchers and developers working on Large Language Models (LLM) and Multimodal (MM) foundation model pretraining and post-training. Our GenAI Frameworks provide end-to-end model training, including pretraining, alignment, customization, evaluation, deployment, and tooling to optimize performance and user experience. Build on Megatron Core Framework's capabilities by inventing advanced distributed training algorithms and model optimizations. Collaborate with partners to implement optimized solutions. What you’ll be doing: • Build and develop open source Megatron Core. • Address extensive AI training and inference obstacles, covering the entire model lifecycle including orchestration, data pre-processing, conducting model training and tuning, and deploying models. • Work at the intersection of AI applications, libraries, frameworks, and the entire software stack. • Spearhead advancements in model architectures, distributed training strategies, and model parallel approaches. • Enhance the pace of foundation model training and optimization through mixed precision formulas and advanced NVIDIA GPU structures. • Performance tuning and optimizations of deep learning framework and software components. • Research, prototype, and develop robust and scalable AI tools and pipelines.

更新于 2025-10-13
logo of nvidia
社招

We are now looking for a Deep Learning Performance Software Engineer!We are expanding our research and development for Inference. We seek excellent Software Engineers and Senior Software Engineers to join our team. We specialize in developing GPU-accelerated Deep learning software. Researchers around the world are using NVIDIA GPUs to power a revolution in deep learning, enabling breakthroughs in numerous areas. Join the team that builds software to enable new solutions. Collaborate with the deep learning community to implement the latest algorithms for public release in Tensor-RT. Your ability to work in a fast-paced customer-oriented team is required and excellent communication skills are necessary.  What you’ll be doing: • Develop highly optimized deep learning kernels for inference • Do performance optimization, analysis, and tuning • Work with cross-collaborative teams across automotive, image understanding, and speech understanding to develop innovative solutions • Occasionally travel to conferences and customers for technical consultation and training

更新于 2025-09-24
logo of nvidia
社招

We are now looking for a Deep Learning Performance Software Engineer! We are expanding our research and development for deep learning. We seek excellent Software Engineers and Senior Software Engineers to join our team. We specialize in developing GPU-accelerated Deep learning software. Researchers around the world are using NVIDIA GPUs to power a revolution in deep learning, enabling breakthroughs in numerous areas. Join the team that builds software to enable new solutions. Your ability to work in a fast-paced customer-oriented team is required and excellent communication skills are necessary.   What you’ll be doing: • Develop deep learning compiler • Develop highly optimized deep learning kernels • End-to-end performance optimization • Do performance optimization, analysis, and tuning

更新于 2025-09-24