Big Data Research ›› 2021, Vol. 7 ›› Issue (2): 101-122.doi: 10.11959/j.issn.2096-0271.2021016

• TOPIC:VIRTUAL DATA SPACE FOR HIGH-PERFORMANCE COMPUTING • Previous Articles     Next Articles

Virtual data space system for national highperformance computing environment

Guangjun QIN1, Limin XIAO2,3, Guangyan ZHANG4, Beifang NIU5,6, Zhiguang CHEN7   

  1. 1 Smart City College, Beijing Union University, Beijing 100101, China
    2 School of Computer Science and Engineering, Beihang University, Beijing 100191, China
    3 State Key Laboratory of Software Development Environment, Beijing 100191, China
    4 Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
    5 Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
    6 University of Chinese Academy of Sciences, Beijing 100190, China
    7 School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China
  • Online:2021-03-15 Published:2021-03-01
  • Supported by:
    The National Key Research and Development Program of China(2018YFB0203901)

Abstract:

High-performance computing (HPC) environment is the core information infrastructure supporting national scientific and technological innovation, economic development and national defense construction.High-performance computing powers around the world have been building wide-area HPC environments based on multi-supercomputing center resources.However, in the high-performance computing environment, there are many kinds of resources and wide geographical distribution, which cannot effectively exert the aggregation effect of resources, and it is difficult to meet the requirements of large-scale applications for unified management and efficient access to wide-area distributed data.To this end, a complete set of technologies were proposed, which could be used to build wide-area global virtual data space, including virtual data space model, cross-domain virtual data space constructing, efficiently migrating data in a wide-area environment, co-scheduling of storage resources and computing job and cross-domain high concurrency data aggregation processing, etc.Based on the above, a virtual data space system has been developed for the national high-performance computing environment (NHPCE), which can effectively support the unified and efficient access to the wide area distributed heterogeneous storage resources, and the distributed data in the wide-area environment can be shared and cooperative processed in a cross-domain manner.At present, the system was experimental deployed in NHPCE and three typical large-scale applications, such as molecular docking, genome-wide association study and weather forecasting model, have been verified.The verification results show that the developed technology and software system can effectively aggregate the wide area distributed storage resources and meet the data space requirements of large-scale applications.

Key words: high-performance computing environment, large-scale computing problem, virtual data space, wide-area distributed storage, unified namespace

CLC Number: 

No Suggested Reading articles found!