平头哥平头哥-芯片互联设计高级专家-上海
任职要求
* Minimum Bachelar degree in Computer Science or Electronics Engineering; M.S. or Ph.D. is preferred
* Minimum of 8 years (for M.S.) 5 years (for Ph.D.) of experience on computer architecture or network chip design with proven silicon result. AI chip, Switch chip, RDMA, RoCE sub-domain is preferred.
* Strong experience in at least one of the following areas is a must:
Server level AI chip design.
Smart NIC/RDMA/RoCE design.
State-of-the-art Switch chip design (12.8T and above).
* Hand-on experience of high-speed interface, such as Die to Die, high speed long-reach SerDes, MAC/PCS, FEC is a plus.
* Hand-on experience of interconnects in AI domain is a big plus.
* Strong knowledge of AI infrastructure and AI network including Scale-up/Scale-out network, and knowledge of LLM inference and training is a big plus.
* Good verbal and written skill for communication.
工作职责
In this role, you will work with software and hardware engineering groups to define the next-generation inter-chip network architecture for high-performance AI chip and AI network. Requirement of the Job * Identifies the challenging problems, and evaluate various solutions for the next-generation of network for AI chip and AI Super Pod. * Gets strong influences on future AI products by advanced architecture design as the excellent interface between software and hardware. * Documents the high-level architecture specification that defines the inter-chip network subsystem for AI chips. * Participation of front-end Implementation of key subsystem. * Strong technical leadership to archive successful delivery of final silicon product. * Works closely with design, system, and verification team.
1、与架构、软件、设计等团队合作构建高端芯片设计验证平台; 2、负责和主导验证方法学和验证策略制定,开发高性能验证架构; 3、负责和主导数据中心芯片互联验证TB开发、环境开发、测试向量开发及调试,覆盖率收集及整体DV signoff的流程开发; 4、负责和主导芯片验证文档的撰写,验证Testbench搭建及实现,Testplan等;
1. 针对整机柜服务器产品进行全面的计算、互联、监控等能力的全面赋能和开发; 2. 寻找适合于整机柜异构服务器产品的业务产品并进行对应的昆仑组件设计和开发,包含性能评测分析、容器镜像等 3. 针对整机柜硬件架构及业务落地场景,构建故障异常case并能设计故障诊断方案; 4. 负责跟踪及研究主流GPU架构设计技术,参与下一代AI Infra的设计; 5. 根据业务画像,构建有竞争力的异构硬件和系统全链路的量化分析,形成数据决策数据库;有效推动异构服务器产品的规划和落地。
1、技术洞察和问题定义 •洞察网络技术发展方向,熟悉和定义技术标准以及前沿进展,跟踪关键竞对的技术和方法。 •结合产品现状定义网络问题,理解业务目标并分解到网络技术待解决问题中。 2、架构规划 •对业务的网络诉求进行架构抽象,建立业务长期发展的网络架构模型和规划。 •制定负责领域的网络架构规划和技术路线图,确定系统目标和方向,考虑关键技术选型、部署架构等多方面因素,确保网络架构的稳定高质量演进。 3、架构设计 •设计网络架构方案,包括规模能力、架构互联、路由、高可靠等。 •拆解关键设计目标为架构系统设计方案,全面评估技术选型、成本、稳定性、部署复杂度等多方面进行架构详细设计。 4、架构落地 • 详细设计架构落地的依赖,并设计和推动各组件和团队进行设计开发,以支撑架构落地。 • 设计测试方案,针对架构设计的关键feature和组件进行全面测试评估,确保技术可行性、性能、稳定性等满足设计预期。 • 输出架构详细设计LLD文档和架构测试文档,向运营团队输出架构运维指导。 5、技术沉淀和赋能 •团队技术架构分享、技术文档和架构规范沉淀、竞对技术调研和分析、学习引入新的网络技术,学习并分享负责业务方向的业务架构,尤其成为业务对网络部分的专家,形成文档沉淀。