英伟达Deep Learning Performance Architect - Perf Tools
任职要求
• BS+ in Computer Science, Electronic Engineering or related (or equivalent experience) • 4+ years of software development • Strong software skill in design, coding (C++ and Python), analytical and debugging in low-level program • Strong grasp of computer architecture (pipelines, memory hierarchies) and operating system fundamentals • Experience with performance modeling, architecture simulation, profiling, and analysis. • Self-starter who thrives in dynamic environments and manages competing priorities effectively. Ways to stand out from the crowd: • Exper…
工作职责
• Architect Performance Tooling: Develop infrastructure tools/libraries for GPU performance analysis, visualization, and automated workflows used across GPU SW/HW development life cycle. • Unlock Architectural Insights: Analyze GPU workloads to identify bottlenecks and define new hardware profiling features that enhance perf debug and profiling capabilities. • AI-Powered Automation: Build AI/ML-driven tools to automate performance analysis, generate perf optimization guidance, and improve user experience of profiling infrastructure. • Cross-Stack Collaboration: Partner with kernel developers, system software teams, and hardware architects to support performance study, improve CUDA software stack, and co-design performance-centric solutions for current and next-generation GPU architecture
A key part of NVIDIA's strength is our sophisticated analysis / debugging tools that empower NVIDIA engineers to improve perf and power efficiency of our products and the running applications. We are looking for forward-thinking, hard-working, and creative people to join a multifaceted software team with high standards! This software engineering role involves developing tools for AI researchers and SW/HW teams running AI workload in GPU cluster.As a member of the software development team, we will work with users from different departments like Architecture teams, Software teams. Our work brings the users intuitive, rich and accurate insight in the workload and the system, and empower them to find opportunities in software and hardware, build high level models to propose and deliver the best hardware and software to our customers, or debugging tricky failures and issues to help improve the performance and efficiency of the system. What you’ll be doing: • Build internal profiling and analysis tools for AI workloads at large scale • Build debugging tools for common encountered problems like memory or networking • Create benchmarking and simulation technologies for AI system or GPU cluster • Partner with HW architects to propose new features or improve existing features with real world use cases
1、深入探索LLM在搜索场景中的推理能力与深度研究(Deep Research)模式,优化信息整合与总结效果,打造高效、精准的智能搜索产品,推动AI技术在实际应用中的突破; 2、AI搜索总结Agent研发: 1)设计并实现基于LLM的搜索总结Agent,提升搜索结果的理解、推理与结构化总结能力; 2)探索LLM Reasoning技术(如思维链、多步推理),优化复杂查询的Deep Research模式,实现长文本理解与跨文档信息融合; 3)构建端到端系统,涵盖意图识别、知识检索、结果生成与偏好对齐,提升用户体验; 3、模型优化及应用: 1)通过指令微调(Instruction Tuning)、偏好对齐(RLHF/DPO)等技术优化模型在搜索场景的适应性; 2)探索多模态信息(文本、代码、结构化数据)融合的搜索与生成技术; 3)研究未来生活中的创新应用场景(如个性化知识助手、自动化研究工具),探索技术边界。
1、负责搭建快手NLP技术体系,包括但不限于文本分类、知识图谱、翻译、对话等; 2、与业务部门进行沟通与协作,交付满足产品需求的核心算法模型与能力。