英伟达Software Engineer, cuDNN - Deep Learning
任职要求
• M.S. degree in computer science (or similar) or equivalent experience. • 2+ years of relevant work or research experience. • Strong programming skills in C/C++ development, work experience with CUDA development, and familiarity with Python. • Good understanding of linear algebra. • Familiarity with the latest trends in machine learning. • Experience designing high level software architecture. • Good problem solving skills, including applications of algorithms and data structures. • Experience with performance analysis, profiling, and code optimization Ways to stand out from the crowd: • GPU programming and optimization expertise (e.g. CUDA or OpenCL). • Practical experience with machine learning, especially deep learning. • Experience with computer architecture and building performance models for CPUs, GPUs, or other accelerators. • Familiar with MLIR development and compiler optimization
工作职责
• Develop production-quality software that ships as part of NVIDIA's AI software stack, including optimized large language model (LLM) support. • Analyze the performance of important workloads, tuning our current software, and proposing improvements for future software. • Work with cross-collaborative teams of deep learning software engineers and GPU architects to innovate across applications like generative AI, autonomous driving, computer vision, and recommender systems. • Adapt to the constantly evolving AI industry by being agile and excited to contribute across the codebase, including API design, software architecture, performance modeling, testing, and GPU kernel development.
• Working directly with key application developers (especially LLM) to understand the current and future problems they are solving, creating and optimizing core parallel algorithms and data structures to provide the best solutions using GPUs, through both library development and direct contribution to the applications. This includes training and inference optimization for large language models, directly contributing to frameworks such as Megatron and TRTLLM, SGLang, vLLM... • Collaborating closely with the architecture, research, libraries, tools, and system software teams at NVIDIA to influence the design of next-generation architectures, software platforms, and programming models, including by investigating impact on application performance and developer productivity. • Engaging in deep optimization of high-performance operators, involving but not limited to CUDA deep optimization, instruction and compiler optimization. These optimizations will directly support customers or be integrated into products like cuDNN, cuBLAS, and CUTLASS... • Some travel is required for conferences and for on-site visits with developers.

Assist in designing, coding, and testing software applications based on project requirements. Participate in code reviews and contribute to team knowledge sharing. Collaborate with cross-functional teams including QA, product, and design teams. Write clean, maintainable, and well-documented code. Troubleshoot, debug, and upgrade existing systems. Stay updated with new technologies and development practices.
• Design, build, and maintain scalable compute pipelines for A/B scorecard calculation, supporting both first-party and future third-party customers. • Develop and optimize distributed systems for high-performance experimentation infrastructure. • Implement and tune big data technologies (e.g., Hadoop, Spark) to ensure efficient processing of large-scale experiment data. • Collaborate with engineers, product managers, and stakeholders to define requirements and deliver impactful experimentation solutions. • Apply A/B testing methodology to drive data-driven decision-making across Microsoft. • Monitor, troubleshoot, and improve the reliability and performance of experimentation pipelines. • Contribute to a culture of innovation, continuous learning, and knowledge sharing within the team.
As a software engineer, you will: • Design, develop, and maintain new features and enhance existing systems.Write clean, testable, and maintainable code. • Troubleshoot live-site issues, deploy fixes, and improve system reliability. • Work collaboratively with cross-functional teams to drive project success. • Ensure security compliance by configuring, updating, and maintaining security tools and standards.