Telecommunications Science ›› 2024, Vol. 40 ›› Issue (6): 146-159.doi: 10.11959/j.issn.1000-0801.2024171

Previous Articles     Next Articles

Challenges and key technologies of new Ethernet for intelligent computing center

Xiaodong DUAN, Jieyu LI, Weiqiang CHENG, Han LI, Ruixue WANG, Haojie WANG   

  1. China Mobile Research Institute, Beijing 100053, China
  • Received:2024-04-01 Revised:2024-06-13 Online:2024-06-20 Published:2024-07-11

Abstract:

AI large model is leading the hot ICT(information and communications technology) industry in the next decade. Intelligent computing center network is a communication base to support the distributed training of AI large model, and it is one of the key factors to determine the efficiency of AI clusters. The data volume and the number of parameters of AI large model are expanding continuously, which brings the network of intelligent computing centers serious challenges, and also brings an opportunity for intergenerational innovation of key network technologies. In the process of AI large model training and inferencing, providing high performance and high security transmission of data are the two core requirements of AI business for intelligent computing network. Efficient load balancing, congestion control technologies and network security protocols are the key network technologies. To address the challenge brought by large-scale AI business, global scheduling ethernet (GSE) was proposed as a corresponding solution, and realistic test environment was built to compare the performance of GSE and RoCE. The test results show that GSE significantly improves JCT compared with RoCE network.

Key words: large model AI distributed training, GSE, load balancing, congestion control, network security protocol

CLC Number: 

No Suggested Reading articles found!