logo of nvidia

英伟达Networking Solution Test Engineer - AI IB and Ethernet Cluster Debugging

社招全职地点:上海 | 北京状态:招聘

任职要求


• B.A./B.Sc. in Computer Science, Electrical Engineering, or equivalent IT/Network/Systems experience.
• 2+ years of hands‑on networking or system‑level testing and debugging on Linux.
• Strong Linux networking and debugging skills (for example perf, tcpdump, ethtool, iproute2).
• Proven production‑grade debugging experience: forming hypotheses, running experiments, and driving issues to root cause under pressure.
• Expertise in host‑side NIC validation and tuning (offloads, queues, interrupts, firmware/driver interactions).
• Strong knowledge of AI networking libraries (such as NCCL) and protocols (such as RoCE and RDMA), including performance and correctness debugging.
• Ability to read and reason about source code (C/C++/Python or similar) and collaborate closely with developers on fixes.
• Solid scripting and automation skills with Bash / Python / Ansible for setup, log collection, and experiment o…
登录查看完整任职要求
微信扫码,1秒登录

工作职责


We are looking for a networking test engineer with strong system‑level debugging skills to join our End‑to‑End Verification team. You will work on cutting‑edge Ethernet‑based AI clusters, owning complex issues across hardware, system software and AI workloads. 
What you’ll be doing:
• Design and review test and product requirements across the InfiniBand / Ethernet / NIC / DPU / Switch portfolio, focusing on large‑scale AI cluster behavior.
• Build and maintain realistic customer‑like testbeds, including heterogeneous hardware, OS / driver combinations and complex network fabrics.
• Own end‑to‑end cluster troubleshooting: reproduce customer scenarios, triage across the stack and drive issues to root cause and fix.
• Read and understand relevant source code to identify defects, validate fixes and improve logging and instrumentation.
• Collaborate closely with development teams to debug NCCL, RoCE/RDMA and related networking components using logs, code inspection and targeted experiments.
• Define tests and guide the automation team to implement robust suites that produce actionable logs, metrics and traces.
• Run Regression, Performance, Functional and Scale testing, analyze results and provide clear, data‑driven reports to stakeholders.
• Profile and benchmark deep learning training and inference workloads, correlating model‑level metrics with system and network telemetry to uncover bottlenecks.
包括英文材料
Linux+
Perf+
NCCL+
C+
还有更多 •••
相关职位

logo of alibaba
社招LAZADA

The role LazPay is the payments and financial services arm of Lazada and is part of the Financial and Payment Business organization of Alibaba International Digital Commerce (AIDC). LazInsure is a growing business unit that focuses on providing embedded insurance solutions within AIDC's e-commerce platforms. We are looking for a customer centric and motivated individual to help with operations and insights to drive the best user experience for Lazada buyers who avail of our insurance products Key Tasks & Responsibilities - Leads the identification and development of business opportunities by doing market opportunity sizing and development of business cases; - Own end to end process of operating and launching insurance products for Lazada in the Philippines - Deliver sales of products, programs, and services through e-commerce platform embedded sales and other channels as measured by net revenue - Setup Customer Segmentation, Strategy, and Retention including insights and business intelligence with support of internal Business intelligence team; - Lead in driving the expansion of products and services through partnerships and vendor selection - Regularly monitor market and benchmark against competition, track regulatory landscape changes, as well as overall InsurTech developments locally and globally - Oversees communication with internal units regarding updates or changes to the products and manages key stakeholders within Alibaba International Digital Commerce - Manage relationships with insurance partners, assisting with the preparation of contractual agreements and ensure that these are fully adhered to as needed - Drive win-win negotiations between Lazada and insurance partners· Contribute to category performance management and deliver insights to senior management - Prepare for and assist with the implementation of regular campaigns and initiatives - Identify and champion product improvements - Any other duties that may be assigned

更新于 2025-05-29马尼拉
logo of aliyun
社招综合类-人力资源

Responsibilities: The responsibility is to design and implement talent solutions to meet Alibaba Cloud's explosive business growth. This includes: ● Designing and executing a comprehensive international talent acquisition strategy to meet the organization’s evolving needs. ● Collaborating closely with hiring managers&Partners to understand staffing needs and providing strategic guidance on best practices in talent acquisition. ● Sourcing, qualifying and hiring across countries/business units, primarily for Business Development, Pre-Sales and AI technology related roles; assisting to drive our solutions/products to expand our business globally and develop Alibaba Cloud's cutting-edge technologies globally. ● Articulating the risk and benefits of a hire, mitigating hiring risk by identify pros and cons for each hiring, especially to those senior level positions; advising the principals and methods about making right hiring decision, enabling hiring manages and other interviewers to improve their skillset by provide trainings, sharing and case studies. ● Create robust pipelines, leverage existing talent programs and initiatives to attract potential candidates, enhance Alibaba's employer brand by spreading our talents strategy through all channels, social networking and talent events. ● Being an ambassador for Alibaba Cloud, providing great candidate experience that lends genuine insight into our culture and what it’s like to really work here. ● Effectively manage all resources, systems and tools, work to improve upon existing products and processes to drive recruiting efficiency and innovation.

更新于 2025-06-18新加坡
logo of aliyun
社招5年以上云智能集团

Responsibilities: The responsibility is to design and implement talent solutions to meet Alibaba Cloud's explosive business growth. This includes: ● Designing and executing a comprehensive international talent acquisition strategy to meet the organization’s evolving needs. ● Collaborating closely with hiring managers&Partners to understand staffing needs and providing strategic guidance on best practices in talent acquisition. ● Sourcing, qualifying and hiring across countries/business units, primarily for Business Development, Pre-Sales and AI technology related roles; assisting to drive our solutions/products to expand our business globally and develop Alibaba Cloud's cutting-edge technologies globally. ● Articulating the risk and benefits of a hire, mitigating hiring risk by identify pros and cons for each hiring, especially to those senior level positions; advising the principals and methods about making right hiring decision, enabling hiring manages and other interviewers to improve their skillset by provide trainings, sharing and case studies. ● Create robust pipelines, leverage existing talent programs and initiatives to attract potential candidates, enhance Alibaba's employer brand by spreading our talents strategy through all channels, social networking and talent events. ● Being an ambassador for Alibaba Cloud, providing great candidate experience that lends genuine insight into our culture and what it’s like to really work here. ● Effectively manage all resources, systems and tools, work to improve upon existing products and processes to drive recruiting efficiency and innovation.

更新于 2025-09-23新加坡
logo of antgroup
社招8年以上综合类-法务

1. Work with each regulated entities’ compliance officers to standardize and harmonize the regulated entities’ outsourcing policies & procedures, develop and maintain an overarching outsourcing framework to manage the risks associated with its outsourced activities, including due diligence processes; 2. Act as the owner and subject matter expert of the outsourcing framework and supporting procedures, working closely with leadership teams, Legal, Compliance, Product teams, Risk Teams, Technology, etc. to ensure these outsourcing functions are meeting regulatory requirements in these licensed jurisdictions; 3. Manage internal governance (including Outsourcing Committee(s) where relevant and approval process are in place and prepare all necessary information for Management approval, at Board level, etc., as appropriate; 4. Work closely with Compliance on regulatory notifications and interactions; 5. Act as primary interface between Legal and the regulated entities in the execution of inter-company agreements, including amendments; 6. Maintain the Outsourcing Register across multiple entities in conjunction with Compliance and Management Teams; 7. Manage the internal monitoring and oversight process, including collating data, challenging service recipients, identifying improvements, tracking actions to completion and reporting to the relevant governance forums up to and included Board level of the regulated entities; 8. Engage in periodic service reviews to ensure that all contracts remain consistent with services delivered and that Key Performance Indicators (KPIs) and Key Risk Indicators (KRIs) remain within agreed tolerances; 9. Work with Compliance officers to complete regulatory submissions, as required. Assist business functions in responding to regulatory queries, on-site visit preparation and follow-on requests for information; 10. Participate as necessary in projects for the termination of any outsourced service; 11. Serve as the outsourcing SME on matters related to the outsourcing operations and framework.

更新于 2025-09-04上海|杭州