logo of nvidia

英伟达AI Computing Development Engineer, TensorRT-LLM

社招全职地点:上海状态:招聘

任职要求


• Masters or higher degree in Computer Engineering, Computer Science, Applied Mathematics or related computing focused degree (or equivalent experience)
• 2+ years of relevant software development experience.
• Excellent C/C++ or Python programming and software design skills, including debugging, performance analysis, and test design.
• Strong curiosity about artificial intelligence, awareness of the latest developments in deep learning like LLMs, generative models
• Experience working with deep learning frameworks PyTorch, TensorRT-LLM, NeMo, vLLM
• Proactive and able to work without supervision
• Excellent written and oral communication skills in English

工作职责


• Craft and develop robust inferencing software that can be scaled to multiple platforms for functionality and performance
• Performance analysis, optimization and tuning
• Closely follow academic developments in the field of artificial intelligence and feature update TensorRT-LLM
• Provide feedback into the architecture and hardware design and development
• Collaborate across the company to guide the direction of machine learning inferencing, working with software, research and product teams
• Publish key results in scientific conferences
包括英文材料
C+
Python+
PyTorch+
TensorRT+
大模型+
vLLM+
相关职位

logo of nvidia
社招

NVIDIA has continuously reinvented itself over two decades. NVIDIA’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI — the next era of computing — with the GPU acting as the brain of computers, robots, and self-driving cars that can perceive and understand the world.This is our life’s work — to amplify human imagination and intelligence. AI becomes more and more important in Auto Driving and AI City. NVIDIA is at the forefront of the Auto Driving and AI City revolution and providing powerful solutions for them. All these solutions are based on GPU-accelerated libraries, such as CUDA, TensorRT and V/LLM inference framework etc. Now, we are now looking for an LLM inference framework developer engineer based in Shanghai. What you’ll be doing :• Craft and develop robust inferencing software that can be scaled to multiple platforms for functionality and performance • Performance analysis, optimization and tuning • Closely follow academic developments in the field of artificial intelligence and feature update • Collaborate across the company to guide the direction of machine learning inferencing, working with software, research and product teams

更新于 2025-09-29
logo of nvidia
社招

• Define the end-to-end technical architecture for the NIM Factory, from container build systems and CI/CD to Kubernetes deployment patterns and runtime optimization. • Drive technical strategy and roadmap, making high-impact decisions on frameworks, technologies, and standards that empower dozens of engineering teams. • Architect and influence the design of workflow orchestration systems that underpin the NIM factory. • Coach and mentor senior engineers across the organization, fostering a culture of technical excellence, innovation, and knowledge sharing. • Champion best practices in software development, including API design, automation, observability, and secure supply chain management. • Collaborate with leadership across research, backend, SRE, and product to align technical vision with product goals and influence technical roadmaps.

更新于 2025-09-18
logo of amazon
社招Solution

- As an AIML Specialist Solutions Architect (SA) in AI Infrastructure, you will serve as the Subject Matter Expert (SME) for providing optimal solutions in model training and inference workloads that leverage Amazon Web Services accelerator computing services. As part of the Specialist Solutions Architecture team, you will work closely with other Specialist SAs to enable large-scale customer model workloads and drive the adoption of AWS EC2, EKS, ECS, SageMaker and other computing platform for GenAI practice. - You will interact with other SAs in the field, providing guidance on their customer engagements, and you will develop white papers, blogs, reference implementations, and presentations to enable customers and partners to fully leverage AI Infrastructure on Amazon Web Services. You will also create field enablement materials for the broader SA population, to help them understand how to integrate Amazon Web Services GenAI solutions into customer architectures. - You must have deep technical experience working with technologies related to Large Language Model (LLM), Stable Diffusion and many other SOTA model architectures, from model designing, fine-tuning, distributed training to inference acceleration. A strong developing machine learning background is preferred, in addition to experience building application and architecture design. You will be familiar with the ecosystem of Nvidia and related technical options, and will leverage this knowledge to help Amazon Web Services customers in their selection process. - Candidates must have great communication skills and be very technical and hands-on, with the ability to impress Amazon Web Services customers at any level, from ML engineers to executives. Previous experience with Amazon Web Services is desired but not required, provided you have experience building large scale solutions. You will get the opportunity to work directly with senior engineers at customers, partners and Amazon Web Services service teams, influencing their roadmaps and driving innovations.

更新于 2025-07-18
logo of amd
社招 Enginee

THE ROLE: MTS Software development engineer on teams building and optimizing Deep Learning applications and AI frameworks for AMD GPU compute platforms.  Work as part of an AMD development team and open-source community to analyze, develop, test and deploy improvements to make AMD the best platform for machine learning applications. THE PERSON: Strong technical and analytical skills in C++ development in a Linux environment. Ability to work as part of a team, while also being able to work independently, define goals and scope and lead your own development effort. KEY RESPONSIBILITIES: Optimize Deep Learning Frameworks: In depth experience in enhance and optimize frameworks like TensorFlow and PyTorch for AMD GPUs in open-source repositories. Develop GPU Kernels: Create and optimize GPU kernels to maximize performance for specific AI operations. Develop & Optimize Models: Design and optimize deep learning models specifically for AMD GPU performance. Collaborate with GPU Library Teams: Work tightly with internal teams to analyze and improve training and inference performance on AMD GPUs. Collaborate with Open-Source Maintainers: Engage with framework maintainers to ensure code changes are aligned with requirements and integrated upstream. Work in Distributed Computing Environments: Optimize deep learning performance on both scale-up (multi-GPU) and scale-out (multi-node) systems. Utilize Cutting-Edge Compiler Tech: Leverage advanced compiler technologies to improve deep learning performance. Optimize Deep Learning Pipeline: Enhance the full pipeline, including integrating graph compilers. Software Engineering Best Practices: Apply sound engineering principles to ensure robust, maintainable solutions.

更新于 2025-09-17