微软Principal Software Engineering Manager- GPU Inference Optimization
任职要求
• Bachelor's degree in computer science or related technical field AND 5+ years technical engineering experience with coding in languages including, but not limited to, C/C++, CUDA, ROCm or equivalent experience • Practical Experience writing new GPU kernels, going beyond experience of GPU workloads with existing library kernels • Quick learning, good communication (fluent in English) and solid problem-solving skills • Cross-team collaboration skills and the desire to collaborate in a team of researchers and developers • Experience in low-level performance analysis and optimization, including proficiency using GPU profiling tools such as NVIDIA Visual Profiler, and NVIDIA Nsight Compute is a plus • Familiar with LLM inference optimization, experience in developing popular inference framework such as TensorRT-LLM, SGLang, …
工作职责
• Lead the software development in C/C++, Python, and in GPU languages such as CUDA, ROCm, or Triton• Analyze metrics and identify opportunities based on offline and online testing, develop and deliver robust and scalable solutions.• Work with cutting-edge hardware stacks and a fast-moving software stack to deliver best-of-class inference and optimal cost.• Engage with key partners to understand and implement inference and training optimization for state-of-the-art LLMs and other models.
THE ROLE: The mission of the Principal Technical Lead is to orchestrate and elevate the quality, consistency, and competitiveness of AMD's GPU software ecosystem on Linux. This leader will bridge strategic objectives with technical execution across the ROCm stack and Linux driver portfolios (both packaged and inbox), ensuring a seamless, powerful, and reliable experience for developers, researchers, and enterprises choosing AMD for their accelerated computing needs. KEY RESPONSIBILITIES: Strategic Technical Leadership & SOW Definition Act as the central technical nexus between Product Management, Software Architecture, and engineering teams (kernel, ROCm, QA, support). Translate high-level product goals and market requirements into detailed, actionable, and prioritized Technical Statements of Work (SOWs) for RSL AI validation team ensure validation plans are coherent, dependencies are managed, and resources are aligned to deliver on strategic commitments for both Radeon and Ryzen AI solutions. Quality, Test & Process Optimization: Own the definition and evolution of the product quality bar for AMD's Linux GPU software. · Champion and drive the implementation of a robust, scalable, and automated CI/CD and test infrastructure across Native Linux, WSL, and various hardware platforms. Establish key performance indicators (KPIs) for software quality, release velocity, and regression rates. Use data to drive continuous improvement in development and testing efficiency Unified User Experience & Competitive Analysis: Define and monitor a holistic user experience (UX) scorecard encompassing installation, performance predictability, documentation, and debugging. Institute a formal, ongoing competitive analysis framework to benchmark the AMD software stack (ROCm + Drivers) against key competitors across performance, feature parity, stability, and usability. Serve as the ultimate internal advocate for the end-user, ensuring customer and community feedback is systematically integrated into the development lifecycle. Linux Ecosystem & Driver Consistency: Provide technical guidance and oversight to ensure flawless synchronization between the AMD packaged driver and the upstream Linux kernel (inbox) driver. Strengthen AMD's partnership with the Linux kernel community and major distributions (e.g., Canonical, Red Hat, SUSE). Drives a consistent and high-quality user experience regardless of the driver delivery channel (OS vendor vs. AMD.com).
Responsibilities: The responsibility is to design and implement talent solutions to meet Alibaba Cloud's explosive business growth. This includes: ● Designing and executing a comprehensive international talent acquisition strategy to meet the organization’s evolving needs. ● Collaborating closely with hiring managers&Partners to understand staffing needs and providing strategic guidance on best practices in talent acquisition. ● Sourcing, qualifying and hiring across countries/business units, primarily for Business Development, Pre-Sales and AI technology related roles; assisting to drive our solutions/products to expand our business globally and develop Alibaba Cloud's cutting-edge technologies globally. ● Articulating the risk and benefits of a hire, mitigating hiring risk by identify pros and cons for each hiring, especially to those senior level positions; advising the principals and methods about making right hiring decision, enabling hiring manages and other interviewers to improve their skillset by provide trainings, sharing and case studies. ● Create robust pipelines, leverage existing talent programs and initiatives to attract potential candidates, enhance Alibaba's employer brand by spreading our talents strategy through all channels, social networking and talent events. ● Being an ambassador for Alibaba Cloud, providing great candidate experience that lends genuine insight into our culture and what it’s like to really work here. ● Effectively manage all resources, systems and tools, work to improve upon existing products and processes to drive recruiting efficiency and innovation.
Responsibilities: The responsibility is to design and implement talent solutions to meet Alibaba Cloud's explosive business growth. This includes: ● Designing and executing a comprehensive international talent acquisition strategy to meet the organization’s evolving needs. ● Collaborating closely with hiring managers&Partners to understand staffing needs and providing strategic guidance on best practices in talent acquisition. ● Sourcing, qualifying and hiring across countries/business units, primarily for Business Development, Pre-Sales and AI technology related roles; assisting to drive our solutions/products to expand our business globally and develop Alibaba Cloud's cutting-edge technologies globally. ● Articulating the risk and benefits of a hire, mitigating hiring risk by identify pros and cons for each hiring, especially to those senior level positions; advising the principals and methods about making right hiring decision, enabling hiring manages and other interviewers to improve their skillset by provide trainings, sharing and case studies. ● Create robust pipelines, leverage existing talent programs and initiatives to attract potential candidates, enhance Alibaba's employer brand by spreading our talents strategy through all channels, social networking and talent events. ● Being an ambassador for Alibaba Cloud, providing great candidate experience that lends genuine insight into our culture and what it’s like to really work here. ● Effectively manage all resources, systems and tools, work to improve upon existing products and processes to drive recruiting efficiency and innovation.