Please wait a minute...

当期目录

    15 September 2021, Volume 7 Issue 5
    TOPIC: BIG DATA PROCESSING SYSTEM IN CHINA’S HOMEMADE COMPUTING ENVIRONMENT
    Design of big data processing system supporting multi-satellites and multi-tasks
    Fuli MA, Tao SHI, Ling CHEN, Yan ZHENG, Senlin XIONG
    2021, 7(5):  3-16.  doi:10.11959/j.issn.2096-0271.2021045
    Asbtract ( 340 )   HTML ( 90)   PDF (3287KB) ( 416 )   Knowledge map   
    Figures and Tables | References | Related Articles | Metrics

    With more and more space science satellites were launched in China, the scientific data amount has grown explosively, and the resulting space science satellite big data processing has gradually become a key link in the development of space science innovation.A high-performance ground data processing system is an important driver for promoting the construction of Chinese controllable space science big data ecology and boosting the output of scientific results.Aiming at the characteristics of multi-level classification, multi-source product integration organization and high timeliness requirements in space science satellite data processing, and for the task requirements of multi-satellite and multi-task parallel processing, a high-reliability hardware system design suitable for big data processing business scenarios was proposed, and a unified resource scheduling system based on task type perception was proposed according to the characteristics of scientific satellite processing tasks.Based on these designs, the development of a space science big data scalable ground big data processing system was completed, which had successfully supported the data processing tasks for space science missions from Chinese Academy of Sciences.

    Data processing system for HEP based on domestic processor architecture
    Yaodong CHENG, Yaosong CHENG, Yujiang BI, Yu GAO, Haibo LI, Lu WANG, Qiuling YAO
    2021, 7(5):  17-30.  doi:10.11959/j.issn.2096-0271.2021046
    Asbtract ( 439 )   HTML ( 88)   PDF (2099KB) ( 368 )   Knowledge map   
    Figures and Tables | References | Related Articles | Metrics

    More and more scientific data are produced by fast-developing high energy physics (HEP) experiments, which urgently require advanced data processing system to support scientific research.At present, HEP data processing system is facing new opportunities and challenges with the rapid development of domestic CPU such as ARM architecture.Firstly, a brief introduction to the requirements and architecture of HEP data processing system was given.Then the relevant work such as porting software to domestic CPU architecture was described.Additionally, a cutting-edge computational storage technology for HEP data processing was proposed.Finally, the evaluation results of typical HEP applications on domestic CPU architecture were given as well.

    Software infrastructures for Chinese supercomputers from the perspective of lattice QCD applications
    Ming GONG, Xiangyu JIANG, Ying CHEN, Zhaofeng LIU
    2021, 7(5):  31-39.  doi:10.11959/j.issn.2096-0271.2021047
    Asbtract ( 341 )   HTML ( 35)   PDF (891KB) ( 505 )   Knowledge map   
    References | Related Articles | Metrics

    Lattice QCD is a frontier scientific field for studying elementary particles by numerical simulation methods, which has become one of the major scientific research applications of supercomputers.With the rapid development of Chinese supercomputers, the LQCD softwares need to be refactored due to the limitation of its traditional programming model.The characteristics of scientific applications on super computers from the perspective of lattice QCD were reviewed.A novel programming model targeted to Chinese super computers was proposed to adapt large-scale scientific applications with big data processing, which is a promising development direction for the basic softwares of Chinese supercomputing ecosystem.

    Big data of numerical nuclear reactor and its application
    An WANG, Shuai REN, Xue MIAO, Lingyu DONG, Ying ZHU, Dandan CHEN, Changjun HU
    2021, 7(5):  40-56.  doi:10.11959/j.issn.2096-0271.2021048
    Asbtract ( 266 )   HTML ( 42)   PDF (3696KB) ( 530 )   Knowledge map   
    Figures and Tables | References | Related Articles | Metrics

    The massive amount of data involved in the operation of numerical nuclear reactor (numerical reactor) can be used to optimize existing numerical reactor models, obtain scientific discoveries in the field of nuclear energy, and promote numerical reactor research.Based on the review of the existing data-driven modeling and the prediction of microscopic phenomena in reactors, the concept of the big data of numerical nuclear reactor was put forward, and its important characteristics as industrial and simulation big data were analyzed according to the characteristics of the field of nuclear energy.Taking China virtual reactor 1.0 (CVR 1.0) as an example, starting from the variety, dependency and inaccuracy of the big data of numerical nuclear reactor, the research work of modeling optimization and scientific discovery was carried out by using the multidisciplinary techniques such as neural network, mathematical statistics and numerical analysis, which illustrates the guiding role of the characteristics of the big data of numerical nuclear reactors in numerical reactor research.

    Research and implementation of edge cache system in global virtual data space across WAN
    Jiantong HUO, Limin XIAO, Zhisheng HUO, Yaowen XU
    2021, 7(5):  57-81.  doi:10.11959/j.issn.2096-0271.2021049
    Asbtract ( 226 )   HTML ( 27)   PDF (3349KB) ( 453 )   Knowledge map   
    Figures and Tables | References | Related Articles | Metrics

    Aiming at the problem of a large amount of network bandwidth wasted by data redundancy transmission when the edge client accesses and shares remote data in the GVDS, the caching technology in the wide-area virtual data space system was studied and an edge caching mechanism to optimize the data access path was proposed.The data is cached at a file granularity close to the edge client, thereby improving the performance of upper-level applications to access and share data.The experimental results show that the edge cache system proposed can improve the performance of wide-area data sharing as a supplement to the GVDS.

    A wide-area collaborative scheduling system oriented to big data processing applications
    Chenhao ZHANG, Limin XIAO, Guangjun QIN, Yao SONG, Shixuan JIANG, Jiye WANG
    2021, 7(5):  82-97.  doi:10.11959/j.issn.2096-0271.2021050
    Asbtract ( 492 )   HTML ( 93)   PDF (2461KB) ( 649 )   Knowledge map   
    Figures and Tables | References | Related Articles | Metrics

    Based on the high-performance computing global virtual data space system, a wide-area collaborative scheduling system for big data processing applications was designed and implemented.This system can address the issue of how big data processing applications unified use wide-area storage and computing resources.And it can collaborative schedule of application data and computing tasks based on the computing characteristics of the application and data layout through collaborative scheduling, load balancing scheduling, data locality scheduling strategies.By unified scheduling of application data and computing tasks in the wide-area environment, it can coordinate the utilization of wide-area computing and storage resources, and effectively improve the running performance of big data processing applications.The actual test results in the national high-performance computing environment show that the scheduling method proposed can support big data processing applications effectively, and the running efficiency of typical applications such as wide-area target collaborative recognition and molecular docking can be increased by 3~4 times.

    COLUMN: DATA-DRIVEN OPTIMIZATION
    Optimization from samples
    Zhijie ZHANG, Xiaoming SUN, Jialin ZHANG, Wei CHEN
    2021, 7(5):  100-110.  doi:10.11959/j.issn.2096-0271.2021051
    Asbtract ( 344 )   HTML ( 74)   PDF (1484KB) ( 433 )   Knowledge map   
    References | Related Articles | Metrics

    Optimization from samples studies how one can optimize objective functions from the sample data that one uses to learn them.Firstly, the mathematical model of this problem-optimization from samples model, as well as the inapproximability results under this model, was introduced.Secondly, some approaches and variants of OPS were introduced, in order to circumvent the impossibility results and make optimization possible.Thirdly, one of the variants-the optimization from structured samples model was focused on, and the algorithms for maximum coverage and influence maximization problem under it were introduced in details.Finally, the paper was concluded, and some future research directions for the problem were proposed.

    Combinatorial online learning based on optimizing feedbacks
    Fang KONG, Yueran YANG, Wei CHEN, Shuai LI
    2021, 7(5):  111-130.  doi:10.11959/j.issn.2096-0271.2021052
    Asbtract ( 400 )   HTML ( 56)   PDF (1539KB) ( 651 )   Knowledge map   
    Figures and Tables | References | Related Articles | Metrics

    Combinatorial online learning studies how to learn the unknown parameters and gradually find the optimal combination of targets during the interactions with the environment.This problem has a wide range of applications including advertisement placement, searching and recommendation.Firstly, the definition of combinatorial online learning and its general framework – the problem of combinatorial multi-armed bandits were introduced, and its traditional algorithms and research progress were summarized.Then, the related works of two specific applications, online influence maximization and online learning to rank, were introduced.Finally, the prospective directions of further researches on combinatorial online learning were discussed.

    Applications of reinforcement learning in the field of resource optimization
    Jinyu WANG, Xinran WEI, Wenlei SHI, Jia ZHANG
    2021, 7(5):  131-149.  doi:10.11959/j.issn.2096-0271.2021053
    Asbtract ( 541 )   HTML ( 90)   PDF (1364KB) ( 858 )   Knowledge map   
    References | Related Articles | Metrics

    Resource optimization is an important problem that widely exists in the social operation and economic development.There is massive data accumulated in this field which has laid the foundation for more and more application of reinforcement learning.Due to the wide coverage of resource optimization problems, three important problems from the wide range of resource optimization problems were categorized and chosen, namely resource balancing problem, resource allocation problem, and bin packing problem.The problem formulation and the reinforcement learning agent modeling of these three types of problems were introduced in detail.

    STUDY
    Method of accelerating deep learning with optimized distributed cache in containers
    Kai ZHANG, Yang CHE
    2021, 7(5):  150-163.  doi:10.11959/j.issn.2096-0271.2021054
    Asbtract ( 564 )   HTML ( 66)   PDF (1442KB) ( 717 )   Knowledge map   
    Figures and Tables | References | Related Articles | Metrics

    When using GPU to train deep learning models with large-scale dataset, the data loading and preprocessing stages often decrease overall performance notably.Lots of GPU computing resources are wasted on waiting for loading data from remote storage.Firstly, the methods of accelerating deep learning training with container and distributed cache were introduced.The architecture and initial optimization of such training system, which was implemented with Alluxio and Kubernetes, were introduced as well.Secondly, the task and data co-located scheduling (TDCS) and the colocated scheduling policy were elaborated.Thirdly, TDCS was implemented in Kubernetes cluster, which made the acceleration result more extensible.Finally, the result of training ResNet50 image classification model on 128 NVIDIAV100 GPU devices demonstrates that the proposed methods can bring 2 to 3 times speed up comparing with load data from remote storage directly.

    Legal judgment prediction based on legal judgment documents
    Hu ZHANG, Bangze PAN, Hongye TAN, Ru LI
    2021, 7(5):  164-175.  doi:10.11959/j.issn.2096-0271.2021055
    Asbtract ( 569 )   HTML ( 93)   PDF (2304KB) ( 450 )   Knowledge map   
    Figures and Tables | References | Related Articles | Metrics

    According to the actual needs of the task of “legal judgment prediction” in the field of intelligent judicial services, the research ideas and implementation ways were discussed, and the overall framework and the specific process of this task were introduced.Based on the massive real cases obtained by China Judgments Online and the evaluation dataset of CAIL2018, the categories were sorted out.The format of the experimental dataset was standardized.And the prediction dataset of legal judgment prediction based on legal judgment documents was built.For the judgment prediction model,the high-quality sentences by using the method of decision elements extraction were extracted.Then refer to the judge’s judgment ideas, the whole task of legal judgment prediction was transform into three subtasks, namely the law articles prediction, the charge prediction, and the penalty prediction.Meanwhile, construct the prediction models based on the judgment elements respectively.The experimental results show that the proposed methods achieves excellent results on the criminal law judgment prediction dataset.

Most Download
Most Read
Most Cited