英伟达Deep Learning Performance Architect

社招全职2025-09-03地点：上海状态：招聘

扫码手机上打开

任职要求

• BSc. MS or PhD in relevant discipline (CS, EE, Math, etc.,)
• 4+ years of working experience in relevant directions (e.g., performance models and optimizations) will be a plus
• Be familiar with deep learning platform architecture (e.g., GPU)
• A strong b…

登录查看完整任职要求

微信扫码，1秒登录

工作职责

NVIDIA is developing processors and system architectures that accelerate deep learning on edge devices, workstations and data center GPUs for a variety of applications, including automotive, robotics,  large language models (LLMs) and AI generative models. We are looking for an expert deep learning system performance architect to join our modelling, efficiency optimization, performance projections and analysis effort. In this position, you will have the chance to optimize deep learning hardware and software architecture and make the significant impact in a dynamic technology focused company
What you’ll be doing
:• Analyze performance and efficiency of various machine learning/deep learning algorithms on different architectures
• Identify architecture and software performance bottlenecks and propose optimizations
• Explore new features and hardware capabilities on deep learning applications

📮 投递简历 ✨AI模拟面试

难度：

包括英文材料

大模型+

相关职位

Machine Learning System Engineer | 机器学习系统研发工程师-Data AML-筋斗云人才计划

校招A158012A

Team Introduction: Data AML is ByteDance's machine learning middle platform, providing training and inference systems for recommendation, advertising, CV (computer vision), speech, and NLP (natural language processing) across businesses such as Douyin, Toutiao, and Xigua Video. AML provides powerful machine learning computing capabilities to internal business units and conducts research on general and innovative algorithms to solve key business challenges. Additionally, through Volcano Engine, it delivers core machine learning and recommendation system capabilities to external enterprise clients. Beyond business applications, AML is also engaged in cutting-edge research in areas such as AI for Science and scientific computing. Research Project Introduction: Large-scale recommendation systems are being increasingly applied to short video, text community, image and other products, and the role of modal information in recommendation systems has become more prominent. ByteDance's practice has found that modal information can serve as a generalization feature to support business scenarios such as recommendation, and the research on end-to-end ultra-large-scale multimodal recommendation systems has enormous potential. It is expected to further explore directions such as multimodal cotraining, 7B/13B large-scale parameter models, and longer sequence end-to-end based on algorithm-engineering CoDesign. Engineering research directions include: Representation of multimodal samples Construction of high-performance multimodal inference engines based on the PyTorch framework Development of high-performance multimodal training frameworks Application of heterogeneous hardware in multimodal recommendation systems 1. Algorithmic research directions include: 2. Design of reasonable recommendation-advertising and multimodal cotraining architectures 3. Sparse Mixture of Experts (Sparse MOE) 4. Memory Network 5. Hybrid precision techniques 团队介绍： Data AML是字节跳动公司的机器学习中台，为抖音/今日头条/西瓜视频等业务提供推荐/广告/CV/语音/NLP的训练和推理系统。为公司内业务部门提供强大的机器学习算力，并在这些业务的问题上研究一些具有通用性和创新性的算法。同时，也通过火山引擎将一些机器学习/推荐系统的核心能力提供给外部企业客户。此外，AML还在AI for Science，科学计算等领域做一些前沿研究。课题介绍：大规模推荐系统正在越来越多的应用到短视频、文本社区、图像等产品上，模态信息在推荐系统中的作用也越来越大。字节实践中发现模态信息能够很好的作为泛化特征支持推荐等业务场景，端到端的超大规模多模态推荐系统的研究具有非常大的想象空间。期望在算法和工程CoDesign基础上，对多模态Cotrain、7B/13B大规模参数模型、更长序列端到端等方向进一步进行探索。工程上研究方向包括多模态样本的表征、基于 pytorch 框架的高性能多模态推理引擎、高性能多模态训练框架的构建、异构硬件在多模态推荐系统上的应用；算法上的研究方向包括设计合理的推荐广告和多模态Cotrain结构、Sparse MOE、Memory Network、混合精度等。 1、负责机器学习系统架构的设计开发，以及系统性能调优； 2、负责解决系统高并发、高可靠性、高可扩展性等技术难关； 3、覆盖机器学习系统多个子方向领域的工作，包括：资源调度、任务编排、模型训练、模型推理、模型管理、数据集管理、工作流编排、ML for System等； 4、负责机器学习系统前瞻技术的调研和引入，比如：最新硬件架构、异构计算系统、GPU优化技术的引入落地； 5、研究基于机器学习方法，实现对集群/服务资源使用情况的分析和优化。

更新于 2025-05-26新加坡

Machine Learning Researcher-Search｜搜索算法工程师-搜索-筋斗云人才计划

校招A07472

Team Introduction： The Search Team is primarily responsible for the innovation of search algorithm and architecture research and development (R&D) for products such as Douyin, Toutiao, and Xigua Video, as well as businesses like E-commerce and Local Services. We leverage cutting-edge machine learning technologies for end-to-end modeling and continuously push for breakthroughs. We also focus on the construction and performance optimization of distributed and machine learning systems — ranging from memory and disk optimization to innovations in index compression and exploration of recall and ranking algorithms — providing students with ample opportunities to grow and develop themselves. The main areas of work include: 1. Exploring Cutting-Edge NLP Technologies: From basic tasks like word segmentation and Named Entity Recognition (NER) to advanced business functions like text and multimodal pre-training, query analysis, and fundamental relevance modeling, we apply deep learning models throughout the pipeline where every detail presents a challenge. 2. Cross-Modal Matching Technologies: Applying deep learning techniques that combine Computer Vision (CV) and Natural Language Processing (NLP) in search, we aim to achieve powerful semantic understanding and retrieval capabilities for multimodal video search. 3. Large-Scale Streaming Machine Learning Technologies: Utilising large-scale machine learning to address recommendation challenges in search, making the search more personalized and intuitive in understanding user needs. 4. Architecture for data at the scale of hundreds of billions: Conducting in-depth research and innovation in all aspects, from large-scale offline computing and performance and scheduling optimization of distributed systems to building high-availability, high-throughput, and low-latency online services. 5. Recommendation Technologies: Leveraging ultra-large-scale machine learning to build industry-leading search recommendation systems and continuously explore and innovate in search recommendation technologies. 团队介绍：字节跳动搜索团队主要负责抖音、今日头条、西瓜视频等产品以及电商、生活服务等业务的搜索算法创新和架构研发工作。我们使用前沿的机器学习技术进行端到端建模并不断创新突破，同时专注于分布式系统、机器学习系统的构建和性能优化，从内存、Disk等优化到索引压缩、召回、排序等算法的探索，充分给同学们提供成长自我的机会。主要工作方向包括： 1、探索前沿的NLP技术：从基础的分词、NER，文本、多模态预训练，到业务上的Query分析、基础相关性等，全链路应用深度学习模型，每个细节都充满挑战； 2、跨模态匹配技术：在搜索中应用CV+NLP深度学习技术，实现多模态视频搜索强大的语义理解和检索能力； 3、大规模流式机器学习技术：应用大规模机器学习，解决搜索中的推荐问题，让搜索更加个性化更加懂你； 4、千亿级数据规模的架构：从大规模离线计算，分布式系统的性能、调度优化，到构建高可用、高吞吐和低延迟的在线服务的方方面面都有深入研究和创新； 5、推荐技术：基于超大规模机器学习技术，构建业界领先的搜索推荐系统，对搜索推荐技术进行探索和创新。课题背景/目标：随着大模型技术的快速发展，智能搜索领域迎来了新的机遇和挑战。传统搜索技术在面对海量数据、多模态信息以及用户复杂需求时，逐渐暴露出模型容量不足、语义理解能力有限、资源利用率低等问题。基于大模型的智能搜索构建旨在通过引入大模型技术，提升搜索系统的智能化水平，优化用户体验，并解决超大规模检索、复杂语义理解、资源高效利用等核心问题。具体目标包括： 1、探索大模型与排序算法的结合，提升个性化排序的精度和用户体验； 2、研究生成式检索算法，解决百亿乃至千亿级别候选库的超大规模检索问题； 3、利用大语言模型（LLM）提升复杂多义query的搜索满意度； 4、构建高性能、低资源消耗的大规模批流一体检索和计算系统，提升资源利用率。课题挑战/必要性： 1、个性化排序的挑战：传统排序算法难以充分利用多模态信息（如文本、图像、视频等），且模型复杂度有限，无法满足用户对精准化和个性化搜索的需求； 2、超大规模检索的挑战：传统判别式模型在千亿级别候选库的检索中，面临模型容量不足、索引效率低下等问题，亟需新一代检索算法； 3、复杂query理解的挑战：用户搜索需求日益复杂，传统搜索引擎难以准确理解长难句、多义query的语义，导致搜索结果满意度低； 4、资源利用率的挑战：搜索系统存储和计算分离的架构导致资源利用率低，如何在保证性能的同时优化资源使用成为关键问题； 5、基于大模型的智能搜索构建是解决上述挑战的必要途径。通过引入大模型技术，可以显著提升搜索系统的语义理解能力、检索效率和资源利用率，从而为用户提供更精准、更高效的搜索体验。课题内容： 1、个性化排序大模型研究； 2、超大规模生成式检索算法研究； 3、基于LLM提升复杂多义query的搜索满意度； 4、高性能大规模批流一体检索和计算系统。涉及的研究方向：排序大模型、生成式检索与跨模态融合、大语言模型（LLM）与复杂query理解、高性能计算与存储架构。

更新于 2025-05-26新加坡

Recommendation Large Model Researcher | 推荐大模型算法工程师-电商-筋斗云人才计划

校招A221696

Team Introduction： The team primarily focuses on recommendation services for the International E-commerce Mall, covering information flow recommendation in core scenarios such as the mall homepage, transaction funnels, product detail pages, stores & showcases. Committed to providing hundreds of millions of users daily with precise and personalized recommendations for products, live streams, and short videos, the team dedicates itself to solving challenging problems in modern recommendation systems. Through algorithmic innovations, we continuously enhance user experience and efficiency, creating greater user and social value. Project Background/Objectives: This project aims to explore new paradigms for large models in the recommendation field, breaking through the long-standing structures of recommendation models and Infra solutions, achieving significantly better performance than current baseline models, and applying them across multiple business scenarios such as Douyin short videos/LIVE/E-commerce/Toutiao. Developing large models for recommendation is particularly challenging due to the high demands on engineering efficiency and the personalized nature of user recommendation experiences. The project will conduct in-depth research across the following directions to explore and establish large model solutions for recommendation scenarios: Project Challenges/Necessity: The emergence of LLMs in the natural language field has outperformed SOTA models in numerous vertical tasks. In contrast, industrial-grade recommendation systems have seen limited major innovations in recent years. This project seeks to revolutionize the long-standing paradigms of recommendation model architectures and Infra in the recommendation field, delivering models with significantly improved performance and applying them to scenarios like Douyin short video and LIVE. Key challenges include: High engineering efficiency requirements for recommendation systems; Personalized nature of user recommendation experiences; Effective content representation for media formats like short videos and live streams. The project will address these through deep research in model parameter scaling, content/user representation learning, multimodal content understanding, ultra-long sequence modeling, and generative recommendation models, driving systematic upgrades to recommendation models. Project Content: 1. Representation Learning Based on Content Understanding and User Behavior 2. Scaling of Recommendation Model Parameters and computing 3. Ultra-Long Sequence Modeling 4. Generative Recommendation Models Involved Research Directions: Recommendation Algorithms, Large Recommendation Models. 团队介绍：推荐与营销团队，主要负责国际电商商城推荐业务，涵盖商城首页、交易链路、商品详情页、店铺&橱窗等多个核心场景的信息流推荐业务，致力于每天为亿量级用户提供精准个性化商品、直播、短视频推荐服务；团队致力于解决现代推荐系统中各种有挑战的问题，通过算法不断提升用户体验和效率、创造更大的用户和社会价值。课题背景/目标：本项目旨在探索推荐领域下的大模型新范式，突破现在持续了较长时间的推荐模型结构和Infra的方案，且效果大幅好于现在的基线模型，在抖音短视频/直播/电商/头条等多个业务场景上得到应用。推荐领域的大模型是比较有挑战的事情，推荐对工程效率的要求更高，且用户的推荐体验上是个性化的，本课题会以下多个方向来做深入的研究，探索和建设推荐场景的大模型方案。课题挑战/必要性：自然语言领域LLM的出现，效果在众多垂直任务上都好于sota模型，从推荐领域看过去工业级推荐系统在较长的时间没有大幅的变化过。本项目旨在探索推荐领域下的大模型方案，改变现在持续了较长时间的推荐模型结构和Infra的基本范式，且效果大幅好于现在的模型，在抖音短视频/直播等多个业务场景上得到应用。但是怎么做好推荐领域的大模型也是一个比较有挑战的事情，推荐对工程效率的要求更高，且用户的推荐体验上是个性化的，以及如何短视频、直播等体裁上做号内容的表征也是需要被解决的问题，这里会从模型参数scaling up、内容和用户的表征学习、内容理解多模态、超长序列建模、生成式推荐模型等多个方向来做深入的研究，对推荐场景的模型做系统性的升级。课题内容： 1、基于内容理解和用户行为的表征学习； 2、推荐模型参数和算力scaling up； 3、超长序列建模； 4、生成式推荐模型。涉及研究方向：推荐算法、推荐大模型。

更新于 2025-05-26新加坡

平头哥-编译器技术专家-AI软件-上海/北京/杭州

社招5年以上技术-芯片

1、参与人工智能芯片的软硬件协同设计，指令集功能验证； 2、参与人工智能芯片的编译器算法设计和实现, 工具链开发与维护，网络模型的性能调优； 3、参与深度学习软件栈的设计和实现； 1. Working closely with hardware/architecture engineering and software teams to understand the hardware and software requirements. 2. Responsible for compiler and tool chain design, implementation, maintaining and performance tuning. 3. Responsible for the design and implementation of deep learning software stack.

更新于 2026-01-20上海|北京|杭州