英伟达Senior Software and System Architect
任职要求
• B.Sc/M.Sc/Phd degree in Computer Science, Computer Engineering, or Electrical Engineering • 6+ years of experience as SW Architect/System Architect • 4+ years of experience as SW developer • Deep knowledge and experience with C, Python • Hands on Linux development, Docker and Containers based technologies • Experience with cloud and Data Center networking • Wide knowledge and understanding of networking protocols and common network topologies • Strong design, coding, analytical, debugging and problem-solving skills • Ability to work concurrently with mu…
工作职责
• Lead architecture for cloud-networking including orchestration, provisioning and security solutions • Design state-of-the-art system architecture for DPUs & NICs technologies • Build end-to-end solutions from application level to HW • Responsible for writing effective, clear and reliable architecture specification • Evaluate new technologies and innovate & rapidly develop POC prototypes that can then be developed into full-fledged products/solutions • Work closely with different Nvidia teams around the world including sw & hw architects, R&D, product, solution architects, application and field engineers and more • Work with high profile customers on advanced and future technologies and solutions
NVIDIA data center systems, such as DGX and HGX, have become core to NVIDIA's rapidly growing enterprise and cloud provider businesses. These platforms bring together the full power of NVIDIA GPUs, NVIDIA NVLink, NVIDIA InfiniBand networking, NVIDIA Grace CPUs, and a fully optimized NVIDIA AI and HPC software stack. We are hiring Sr. Software Engineer who will help build simulators for our DGX Server platforms. Simulations play a significant role in building scalable systems at Speed of Light! You will work with world class engineering teams across HW and SW. What you’ll be doing: • Contribute to architect and develop simulation platform for next-gen NVIDIA DGX platforms. • Build, integrate and enhance simulator components with new HW features and write supporting technical documents. • Bring full SW stack up on DGX Simulator; work closely with hardware modeling, kernel & platform driver teams distributed globally. • Improve performance, fix bugs across user and kernel stack, and automate execution flow.
• Technical Development: Design and implement robust iOS solutions using Swift and Objective-C. Write efficient, maintainable, and scalable code that meets coding standards and best practices.• System Architecture Support: Contribute to system architecture discussions and collaborate on developing scalable, secure, and performant solutions.• Code Quality: Participate in code reviews to ensure quality and adherence to best practices. Assist in optimizing, debugging, and refactoring code for performance improvements.• Collaboration: Work closely with cross-functional teams including product management, design, and other engineering groups to align on product features and technical solutions.• Performance Optimization: Investigate performance issues, implement testing strategies, and resolve bottlenecks to ensure a smooth and responsive user experience.• Mentorship and Guidance: Mentor junior team members, sharing knowledge and fostering a collaborative environment to elevate engineering skills across the team.• Data-Driven Decision Making: Utilize telemetry and analytics to improve product performance and refine features based on user feedback.
• Learn to review and break down work items into tasks with stakeholder collaboration, provide estimations, and escalate delays, while also supporting feature deployments to customers, considering user and service impacts, and adhering to best deployment practices for safety. • Collaborate with key stakeholders to define feature requirements, integrate feedback to enhance design, and establish feedback loops for continuous improvement based on customer metrics. • Learn and apply coding standards and best practices through code reviews, developing maintainable and extensible code with guidance. Utilize debugging tools to proactively and reactively address issues in product features, ensuring code quality and reliability. • Support the identification of dependencies and design documentation for product features, learn about system interactions and back-end dependencies, and contribute to architectural processes under guidance. Produce code to test hypotheses for technical solutions and assist with technical validation efforts. Collaborate on quality assurance plans, augment test cases, and integrate automation into testing, while understanding the implications of security and compliance in system architecture. • Contribute to data analysis and feedback integration for product engineering decisions, acting as a Designated Responsible Individual (DRI) for monitoring and restoring system functionality within Service Level Agreement (SLA) timeframe. Participate in live service operations, and support telemetry data integration for system behavior insights, with a focus on performance, reliability, and safety. • Develop and apply best practices for reliable code building, understand global and local regulations, customer scaling requirements, and support communication with key partners across Microsoft for user experience enhancement and partner needs. • Ensure compliance with security, privacy, safety, and accessibility standards, leverage developer tools for code creation and debugging, contribute to automation in production and deployment, and proactively seek knowledge to improve product availability, reliability, efficiency, and performance at scale.
• As software leader for a customer program you will be interfacing to key OEMs/T1s on all SW aspects related to a program - tracking requirements, releases, issues, coordinating technical workshops (including board bring up), and handling of escalations. One of the most important responsibilities of this position is to ensure that customer experience with the program team adds new value to their overall relationship with NVIDIA. • Internally, you will be working with other functional teams to manage software releases in support of customer requirements. This includes defining releases, driving features and bug fixes, testing, and documentation. • Be responsible for a successful delivery of the program while working as a team with a dedicated Program Architect, who will be helping with the technical aspects.