英伟达Perf Analyst Intern
任职要求
• BS in Computer Science or similar computer discipline • Working knowledge of all Windows operating systems • Linux and Windows knowledge. • Good knowledge of PC systems and components. • Good organizational, time management and task prioritization skills. Ways to stand out from the crowd: • Knowledge of C, C++, Perl, OpenGL, Direct X and D3D. • Familiarity with Linux is a plus. • Good knowledge of NVIDIA GeForce and RTX series. • Familiar with Android application is a plus.
工作职责
Have you ever wanted to join a groundbreaking company that is crafting the future of the tech industry? Are you a creative individual with an analytical attitude and a real passion for technology? If this sounds like you then we want to hear from you. We are looking for a dedicated team member to be part of our Performance Lab team to build and improve the work class lab. What you’ll be doing: • Candidate will create, tune, run and analyze graphics and systems benchmarks on PCs, workstations and laptops. Tasks may include some scripting to automate current test processes and benchmarks for improved efficiency. • Candidate may also need to configure computer systems with appropriate hardware and software to run benchmarks on various systems. They would run benchmarks on these systems. • Candidate would analyze results from benchmark runs and create reports to position NVIDIA products appropriately using their evaluation.
• Architect Performance Tooling: Develop infrastructure tools/libraries for GPU performance analysis, visualization, and automated workflows used across GPU SW/HW development life cycle. • Unlock Architectural Insights: Analyze GPU workloads to identify bottlenecks and define new hardware profiling features that enhance perf debug and profiling capabilities. • AI-Powered Automation: Build AI/ML-driven tools to automate performance analysis, generate perf optimization guidance, and improve user experience of profiling infrastructure. • Cross-Stack Collaboration: Partner with kernel developers, system software teams, and hardware architects to support performance study, improve CUDA software stack, and co-design performance-centric solutions for current and next-generation GPU architecture
• Design, develop, and optimize major layers in LLM (e.g attention, GEMM, inter-GPU communication) for NVIDIA's new architectures. • Implement and fine-tune kernels to achieve optimal performance on NVIDIA GPUs. • Conduct in-depth performance analysis of GPU kernels, including Attention and other critical operations. • Identify bottlenecks, optimize resource utilization, and improve throughput, and power efficiency • Create and maintain workloads and micro-benchmark suites to evaluate kernel performance across various hardware and software configurations. • Generate performance projections, comparisons, and detailed analysis reports for internal and external stakeholders. • Collaborate with architecture, software, and product teams to guide the development of next-generation deep learning hardware and software.
1. 负责终端存储领域高价值技术探索预研,包括文件系统,sqlite数据库,block层,新型存储器件等 2. 负责发掘终端场景存储IO性能瓶颈场景,并设计优化方案 3. 负责新型存储器件存储软件栈定制优化,技术方案设计和review