logo of nvidia

英伟达Deep Learning Performance Architect - Perf Tools

社招全职地点:上海 | 北京状态:招聘

任职要求


• BS+ in Computer Science, Electronic Engineering or related (or equivalent experience)
• 4+ years of software development 
• Strong software skill in design, coding (C++ and Python), analytical and debugging in low-level program 
• Strong grasp of computer architecture (pipelines, memory hierarchies) and operating system fundamentals 
• Experience with performance modeling, architecture simulation, profiling, and analysis. 
• Self-starter who thrives in dynamic environments and manages competing priorities effectively. 
  
Ways to stand out from the crowd:
 • Exper…
登录查看完整任职要求
微信扫码,1秒登录

工作职责


• Architect Performance Tooling: Develop infrastructure tools/libraries for GPU performance analysis, visualization, and automated workflows used across GPU SW/HW development life cycle.  
• Unlock Architectural Insights: Analyze GPU workloads to identify bottlenecks and define new hardware profiling features that enhance perf debug and profiling capabilities. 
• AI-Powered Automation: Build AI/ML-driven tools to automate performance analysis, generate perf optimization guidance, and improve user experience of profiling infrastructure. 
• Cross-Stack Collaboration: Partner with kernel developers, system software teams, and hardware architects to support performance study, improve CUDA software stack, and co-design performance-centric solutions for current and next-generation GPU architecture
包括英文材料
C+
Python+
还有更多 •••
相关职位

logo of nvidia
社招

A key part of NVIDIA's strength is our sophisticated analysis / debugging tools that empower NVIDIA engineers to improve perf and power efficiency of our products and the running applications. We are looking for forward-thinking, hard-working, and creative people to join a multifaceted software team with high standards! This software engineering role involves developing tools for AI researchers and SW/HW teams running AI workload in GPU cluster.As a member of the software development team, we will work with users from different departments like Architecture teams, Software teams. Our work brings the users intuitive, rich and accurate insight in the workload and the system, and empower them to find opportunities in software and hardware, build high level models to propose and deliver the best hardware and software to our customers, or debugging tricky failures and issues to help improve the performance and efficiency of the system. What you’ll be doing: • Build internal profiling and analysis tools for AI workloads at large scale • Build debugging tools for common encountered problems like memory or networking • Create benchmarking and simulation technologies for AI system or GPU cluster • Partner with HW architects to propose new features or improve existing features with real world use cases

更新于 2025-06-19上海
logo of bytedance
社招A102569

1、深入探索LLM在搜索场景中的推理能力与深度研究(Deep Research)模式,优化信息整合与总结效果,打造高效、精准的智能搜索产品,推动AI技术在实际应用中的突破; 2、AI搜索总结Agent研发: 1)设计并实现基于LLM的搜索总结Agent,提升搜索结果的理解、推理与结构化总结能力; 2)探索LLM Reasoning技术(如思维链、多步推理),优化复杂查询的Deep Research模式,实现长文本理解与跨文档信息融合; 3)构建端到端系统,涵盖意图识别、知识检索、结果生成与偏好对齐,提升用户体验; 3、模型优化及应用: 1)通过指令微调(Instruction Tuning)、偏好对齐(RLHF/DPO)等技术优化模型在搜索场景的适应性; 2)探索多模态信息(文本、代码、结构化数据)融合的搜索与生成技术; 3)研究未来生活中的创新应用场景(如个性化知识助手、自动化研究工具),探索技术边界。

更新于 2025-03-11北京
logo of kuaishou
社招1年以上D4899

1、负责搭建快手NLP技术体系,包括但不限于文本分类、知识图谱、翻译、对话等; 2、与业务部门进行沟通与协作,交付满足产品需求的核心算法模型与能力。

更新于 2025-04-11北京
logo of kuaishou
社招D4899

1、负责AI小快智能助理机器人的研究和开发; 2、优化基础模型,并采用RAG、Agent等大模型衍生框架,来提升相关业务指标; 3、持续跟进并深入调研大模型前沿技术、开源方案,跟踪业内大模型领域的最新进展并推进相关研究,探寻将最新技术应用到AI小快的可能性。

更新于 2025-04-11北京