英伟达Senior LLM Train Framework Engineer

社招全职2025-10-13地点：上海状态：招聘

扫码手机上打开

任职要求

• MS, PhD or equivalent experience in Computer Science, AI, Applied Math, or related fields and 5+ years of industry experience.
• Experience with AI train frameworks (e.g., PyTorch, JAX), and/or inference and deployment environments (e.g., TRTLLM, vLLM, SGLang).
• Proficiency in decentralized instruction.
• Proficient in Python programming, software development, debugging, performance analysis, test composition, and documentation.
• CUDA or collective programming skills are a big plus.
• Consistent record of working effectively across multiple engineering initiatives and improving AI libraries with new innovations.
• S…

登录查看完整任职要求

微信扫码，1秒登录

工作职责

NVIDIA is now looking for LLM Train Framework Engineers for the Megatron Core team. Megatron Core is open-source, scalable, and cloud-native frameworks built for researchers and developers working on Large Language Models (LLM) and Multimodal (MM) foundation model pretraining and post-training. Our GenAI Frameworks provide end-to-end model training, including pretraining, alignment, customization, evaluation, deployment, and tooling to optimize performance and user experience. Build on Megatron Core Framework&#39;s capabilities by inventing advanced distributed training algorithms and model optimizations. Collaborate with partners to implement optimized solutions.
What you’ll be doing:
• Build and develop open source Megatron Core.
• Address extensive AI training and inference obstacles, covering the entire model lifecycle including orchestration, data pre-processing, conducting model training and tuning, and deploying models.
• Work at the intersection of AI applications, libraries, frameworks, and the entire software stack.
• Spearhead advancements in model architectures, distributed training strategies, and model parallel approaches.
• Enhance the pace of foundation model training and optimization through mixed precision formulas and advanced NVIDIA GPU structures.
• Performance tuning and optimizations of deep learning framework and software components.
• Research, prototype, and develop robust and scalable AI tools and pipelines.

📮 投递简历 ✨AI模拟面试

难度：

包括英文材料

PyTorch+

JAX+

vLLM+

SGLang+

Python+

还有更多 •••

登录查看完整学习资料

相关职位

Applied Scientist 2 (Ads)

社招Research

Bringing the State of the Art to Products  Collaborates with and bridges the gap between researchers (in community, Microsoft Research [MSR], or in their own organizations) and development teams. Brings new technology and approaches into production by applying long-term research efforts to solve immediate product needs.  With limited guidance from others, works to create product impact. Identifies approach, and applies, improves, or creates a research-backed solution (e.g., novel, data driven, scalable, extendable) to positively impact a Microsoft product or service. Solves components or aspects of a problem as assigned by a senior team member. May publish research to promote receiving new intellectual property for product impact.  Participates in collaborative relationships with relevant product and business groups inside or outside of Microsoft and provides expertise or technology to create business impact. Participates in technology transfer attempts, filing patents, authoring white papers, developing or maintaining tools/services for internal Microsoft use, or consulting for product or business groups. May publish research to promote receiving new intellectual property for business impact.  Capability Management and Networking  Maintains ties with external network of peers and identifies prospective talent, when asked. May contribute to publications on research findings. May participate in candidate interviews. Collaborates with the academic community to develop the recruiting pipeline and establish awareness of their work.  Reinforces a positive environment by applying best practices. May support mentorship by assisting with onboarding of research interns or other entry-level team members, if applicable.  Documentation  Performs documentation of work in progress, experimentation results, plans, etc. Documents scientific work to ensure process is captured. Participates in the creation of informal documentation and may share findings to promote innovation within group.  Ethics and Privacy  Understands and follows ethics and privacy policies when executing research processes and/or collecting data/information.  Leveraging Applied Research  Applies strategy by understanding the role in the team and applying the strategy provided by senior team members and incorporates state-of-the-art research. Asks probing questions to better understand strategy.  Researches and develops an understanding of tools, technologies, and methods being used in the community that can be utilized to improve product quality, performance, or efficiency. Contributes knowledge around several specialized tools/methods to support the application of business impact or serves as an expert in a deeply specialized area.  Gains deep knowledge in a service, platform, or domain and acquires knowledge of changes in industry trends and advances in applied technologies. Consults with engineers and product teams to apply advanced concepts to product needs. Learns product domain by reviewing products.  Machine Learning Functionality, Insights, and Technical Tools  Prepares data to be used for analysis by reviewing criteria that reflect quality and technical constraints. Reviews data and suggests data to be included and excluded. Describes actions taken to address data quality problems. Assists with the development of useable datasets for modeling purposes. Supports the scaling of feature ideation and data preparation. Helps take cleaned data and adapts for machine learning purposes, under the direction of a senior team member. Seeks guidance from senior team members when confronted with problems/challenges.  Uses machine learning algorithms that structures, analyzes, and uses data in product and platforms to train algorithms for scalable artificial intelligence solutions before deploying. Begins to develop new machine learning improvements independently while under the direction of a senior team member.  Collaborates to leverage data to identify pockets of opportunity to apply state-of-the-art algorithms to improve a solution to a business problem. Uses statistical analysis tools for evaluating Machine Learning models and validating assumptions about the data while also reviewing consistency against other sources. Begins to independently run basic descriptive, diagnostic, predictive, and prescriptive statistics. Assists with the communication of insights under the direction of senior team members.  Supports the application and use of intelligence created during the training of algorithms for deployment. Seeks information about large-scale computing frameworks, data analysis systems, and modeling environments to improve models. Helps create a model, apply the model to real products, and then verify effects through iterations. Helps with experiments by putting multiple models in production and evaluating their performance. Sets up monitoring and implementation to track production models, under the direction of a senior team member. Addresses models when that break, under the direction of others.  Leverages or designs and uses machine learning/data extraction, transformation, and loading (ETL) of pipelines (e.g., data collection, cleaning) based on data prepared. 

更新于 2025-09-17北京

Deep Learning Senior Engineer, End-To-End Autonomous Driving

社招

N/A

更新于 2025-09-05北京|上海

Senior Prediction and Planning Engineer, VLM - Autonomous Vehicles

社招

N/A

更新于 2025-08-25上海|北京

Senior Applied Scientist, Generative AI Innovation Center

社招Applied

Are you looking to work at the forefront of Machine Learning and AI? Would you be excited to apply Generative AI algorithms to solve real world problems with significant impact? The Generative AI Innovation Center helps AWS customers implement Generative AI solutions and realize transformational business opportunities. This is a team of strategists, scientists, engineers, and architects working step-by-step with customers to build bespoke solutions that harness the power of generative AI. Starting in 2024, the Innovation Center launched a new Custom Model and Optimization program to help customers develop and scale highly customized generative AI solutions. The team helps customers imagine and scope bespoke use cases that will create the greatest value for their businesses, define paths to navigate technical or business challenges, develop and optimize models to power their solutions, and make plans for launching solutions at scale. The GenAI Innovation Center team provides guidance on best practices for applying generative AI responsibly and cost efficiently. You will work directly with customers and innovate in a fast-paced organization that contributes to game-changing projects and technologies. You will design and run experiments, research new algorithms, and find new ways of optimizing risk, profitability, and customer experience. We’re looking for Applied Scientists capable of using GenAI and other techniques to design, evangelize, and implement state-of-the-art solutions for never-before-solved problems. As an Applied Scientist, you will - Collaborate with AI/ML scientists and architects to research, design, develop, and evaluate generative AI solutions to address real-world challenges - Interact with customers directly to understand their business problems, aid them in implementation of generative AI solutions, brief customers and guide them on adoption patterns and paths to production - Help customers optimize their solutions through approaches such as model selection, training or tuning, right-sizing, distillation, and hardware optimization - Provide customer and market feedback to product and engineering teams to help define product direction

更新于 2026-01-13上海|北京