Big Data Research ›› 2024, Vol. 10 ›› Issue (1): 1-8.doi: 10.11959/j.issn.2096-0271.2024016

• STRATEGY RESEARCH •     Next Articles

Four issues to consider in building a computer system supporting large model training

Weimin ZHENG   

  1. Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
  • Online:2024-01-15 Published:2024-01-01

Abstract:

There are three types of computer systems that support large model training, among which the ecosystem based on domestic AI chip systems is not very good.To change this situation, it is necessary to develop 10 key software such as AI compilers and parallel acceleration.Moreover, systems based on supercomputers require good software and hardware collaborative design to better serve large model training.This article proposes a 4-point balanced design for building the infrastructure of a large model to ensure system performance, reliability, and scalability.

Key words: large model training, computer system, supercomputing system, large model infrastructure

CLC Number: 

No Suggested Reading articles found!