logo of microsoft

微软Principal Software Engineering Manager- GPU Inference Optimization

社招全职Software Engineering地点:北京状态:招聘

任职要求


• Bachelor's degree in computer science or related technical field AND 5+ years technical engineering experience with coding in languages including, but not limited to, C/C++, CUDA, ROCm or equivalent experience

• Practical Experience writing new GPU kernels, going beyond experience of GPU workloads with existing library kernels

• Quick learning, good communication (fluent in English) and solid problem-solving skills

• Cross-team collaboration skills and the desire to collaborate in a team of researchers and developers

• Experience in low-level performance analysis and optimization, including proficiency using GPU profiling tools such as NVIDIA Visual Profiler, and NVIDIA Nsight Compute is a plus

• Familiar with LLM inference optimization, experience in developing popular inference framework such as TensorRT-LLM, SGLang, vLLM is a plus

 

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form.

 

Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.

工作职责


• Lead the software development in C/C++, Python, and in GPU languages such as CUDA, ROCm, or Triton• Analyze metrics and identify opportunities based on offline and online testing, develop and deliver robust and scalable solutions.• Work with cutting-edge hardware stacks and a fast-moving software stack to deliver best-of-class inference and optimal cost.• Engage with key partners to understand and implement inference and training optimization for state-of-the-art LLMs and other models.
包括英文材料
C+
C+++
CUDA+
NVIDIA Visual Profiler+
Nsight+
大模型+
开发框架+
TensorRT+
SGLang+
vLLM+
相关职位

logo of amd
社招 Enginee

THE ROLE: The mission of the Principal Technical Lead is to orchestrate and elevate the quality, consistency, and competitiveness of AMD's GPU software ecosystem on Linux. This leader will bridge strategic objectives with technical execution across the ROCm stack and Linux driver portfolios (both packaged and inbox), ensuring a seamless, powerful, and reliable experience for developers, researchers, and enterprises choosing AMD for their accelerated computing needs.   KEY RESPONSIBILITIES: Strategic Technical Leadership & SOW Definition Act as the central technical nexus between Product Management, Software Architecture, and engineering teams (kernel, ROCm, QA, support). Translate high-level product goals and market requirements into detailed, actionable, and prioritized Technical Statements of Work (SOWs) for RSL AI validation team ensure validation plans are coherent, dependencies are managed, and resources are aligned to deliver on strategic commitments for both Radeon and Ryzen AI solutions. Quality, Test & Process Optimization: Own the definition and evolution of the product quality bar for AMD's Linux GPU software. · Champion and drive the implementation of a robust, scalable, and automated CI/CD and test infrastructure across Native Linux, WSL, and various hardware platforms. Establish key performance indicators (KPIs) for software quality, release velocity, and regression rates. Use data to drive continuous improvement in development and testing efficiency Unified User Experience & Competitive Analysis: Define and monitor a holistic user experience (UX) scorecard encompassing installation, performance predictability, documentation, and debugging. Institute a formal, ongoing competitive analysis framework to benchmark the AMD software stack (ROCm + Drivers) against key competitors across performance, feature parity, stability, and usability. Serve as the ultimate internal advocate for the end-user, ensuring customer and community feedback is systematically integrated into the development lifecycle. Linux Ecosystem & Driver Consistency: Provide technical guidance and oversight to ensure flawless synchronization between the AMD packaged driver and the upstream Linux kernel (inbox) driver. Strengthen AMD's partnership with the Linux kernel community and major distributions (e.g., Canonical, Red Hat, SUSE). Drives a consistent and high-quality user experience regardless of the driver delivery channel (OS vendor vs. AMD.com).

更新于 2025-09-24
logo of microsoft
社招Research

• Initiate and advance research to advance state-of-the-art in AI for Software Engineering • Collaborate across disciplines with product teams across Microsoft and Github • Stay up to date with the research literature and product advances in AI for software engineering • Collaborate with world renowned experts in programming tools and developer tools to integrate AI across software development stack for Copilot • Build and manage large-scale AI experiments and models.

更新于 2025-07-21
logo of microsoft
社招Research

• Technical Architecture Design: Develop and execute system architecture and technical roadmaps to ensure the system's high availability, scalability, and security. • Cross-Team Collaboration: Work closely with product managers, UX designers, data scientists, and other team members to understand business requirements and translate them into technical solutions. • Continuous Improvement and Optimization: Monitor system performance, optimize performance, and troubleshoot issues to ensure stable and efficient system operation. • Technical Innovation: Stay attuned to industry trends and new technologies, actively promoting innovation and the adoption of best practices. • Quality Assurance: Establish and enforce standards for code reviews, unit testing, and integration testing to ensure high code quality and system reliability.

更新于 2025-09-03
logo of microsoft
社招Software

Architect, build, and optimize secure and performant AI platform services that power Microsoft Copilot and other next-generation AI scenarios. Provide technical leadership across teams to define long-term architectural direction and drive engineering excellence. Collaborate with infrastructure, platform, product, and research teams to design and deliver scalable, production-grade AI services. Write high-quality, well-tested, secure, and maintainable code and promote high standards across the team. Tackle technically ambiguous or cross-boundary problems, remove roadblocks, and drive delivery across multiple teams or organizations. Lead technical design discussions, mentor senior engineers, and foster a strong engineering culture within the team. Embody Microsoft’s Culture and Values, and help shape the direction of the engineering team and broader organization.

更新于 2025-09-19