Big Data Research

Discovery of irradiation mechanism based on big data of material simulation

Shuai REN, Dandan CHEN, Genshen CHU, He BAI, Huizhao LI, Yuanjie HE, Changjun HU

2021, 7(6): 3-18. doi:10.11959/j.issn.2096-0271.2021056

Asbtract ( 331 )

HTML ( 57)

PDF (2478KB) ( 485 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

The numerical simulation of material irradiation effect is an important means to understand the performance of nuclear materials.The large-scale and high fidelity material numerical simulation based on supercomputer will produce a large amount of numerical calculation data.Understanding the evolution law of the irradiation damage mechanism and performance through mining and analysis based on high-efficiency storage is of great significance for the design and development of nuclear materials and nuclear safety according to the characteristics of numerical calculation big data.The concept of big data of material simulation (MSBD) was proposed, and then the characteristics and significance of MSBD were specifically introduced, and the related work was reviewed.Based on the practical examples of MISA-MD and MISA-SCD on domestic supercomputers, a distributed numerical data storage arihitecture (NDSA) multi-scale correlation and coupling was proposed.Frenkel defect pairs were accurately calculated with XGBoost algorithm based on MSBD of MD, and the cascade collision clusters were artificially divided with Union-Find algorithm.The data of KMC numerical calculation were mined based on density clustering method, and the cluster recognition and classification were realized.The ring like clusters were found from MSBD of KMC based on density clustering algorithm, which was verified with the literature.A DNN-based potential model - AIPM was proposed with MSBD of first principles-based potential data.The further application of MSBD was discussed and prospected in physical modeling and knowledge discovery.

Legal element extraction method based on BERT reading comprehension framework

Hui HUANG, Yongbin QIN, Yanping CHEN, Ruizhang HUANG

2021, 7(6): 19-29. doi:10.11959/j.issn.2096-0271.2021057

Asbtract ( 628 )

HTML ( 111)

PDF (2376KB) ( 632 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

Extraction of legal elements is an important basis for judicial intelligent auxiliary applications, and its purpose is to identify the key elements involved in the judgment document.In the past, extracting legal elements usually used multi-label classification methods for modeling.These methods mainly relied on the text features of the judgment document, thereby ignoring the label features.Besides, due to the imbalanced data problem in judicial data sets, the classification method will lead to poor model performance because of too many negative examples.To solve the above problems, a legal element extraction method based on BERT reading comprehension framework was proposed.This method constructed auxiliary questions with label information and legal prior knowledge, and used the machine reading comprehension model based on BERT to establish the semantic associations between question and judgment document.And this method added special tokens before and after the label in the question to enhance the learning ability of the model.Experiments were conducted on the legal element extraction data sets of the CAIL2019.Experiment results show that the performance is improved significantly, and the F1 value has been increased by 2.7%, 11.3%, and 5.6% respectively on the data sets of marriage and family case, labor dispute case, and loan contract dispute case.

Charge prediction method combined with case elements sequence

Qian SUN, Yongbin QIN, Ruizhang HUANG, Lijuan LIU, Yanping CHEN

2021, 7(6): 30-40. doi:10.11959/j.issn.2096-0271.2021058

Asbtract ( 278 )

HTML ( 46)

PDF (1699KB) ( 513 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

Charge prediction is to find the appropriate charges based on the facts of the given case.Existing methods mainly use text content for classification, but they cannot effectively use case elements.For the shortcomings of the existing methods, the method of accusation prediction based on the sequence of case elements was put forward.The way expressed the case factual processes as a series of case elements with “behavior” as the core and time-series relationship.Then graph convolutional network was used to represent.Finally, the semantic features of the text were fused to predict the crime.Experiments show that this method has better prediction performance than existing methods.Meanwhile, this method also has a good performance for the distinction between easily confusing charges.

Chinese comment sentiment analysis method based on multi-input model and syntactic structure

Baohua ZHANG, Huaping ZHANG, Tieshuai LI, Jianyun SHANG

2021, 7(6): 41-52. doi:10.11959/j.issn.2096-0271.2021059

Asbtract ( 305 )

HTML ( 65)

PDF (1589KB) ( 465 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

Massive network texts have brought huge opportunities and challenges to sentiment analysis tasks.Traditional rule-based methods have been difficult to analyze such texts.Existing deep learning methods have some shortcomings.On the one hand, the inputs of the model only include the text embedding matrix, lack the use of other features.On the other hand, the algorithm of word embedding will lead to the lack of text structure information, then impact the result.Based on the research of syntactic rule in the rule-based sentiment analysis methods, a multi-input model combined with MCNN, LSTM and fully connected neural network was proposed.Meanwhile, a syntactic feature extractor to combine the syntactic features was constructed in the deep learning model.Experiments on three public data sets were conducted.The results show that the model constructed in this article has better classification performance than other models, and the introduction of syntactic rule features has a little improvement in the classification effect of the model.

Applications of big data cognitive computing in content security governance

Xuetao DU

2021, 7(6): 53-66. doi:10.11959/j.issn.2096-0271.2021060

Asbtract ( 332 )

HTML ( 91)

PDF (1912KB) ( 437 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

In the communication network, there is a mass of bad information needed to be read and understood to extract useful knowledge and features for governance.Methods based on manual analysis can not achieve this goal.It is necessary to adopt the big-data-based cognitive computing technology to help to understand massive data and customize content security strategies.Aiming at four practical problems including telecommunication fraud governance, bad message governance, variant message governance and bad website governance, the big data cognitive computing solutions were put forward, and the practical results were given.The results show that the solutions could find the bad information quickly, and improve the quality of the content security governance effectively.

Algorithm of locality sensitive hashing bit selection based on feature selection

Wenhua ZHOU, Huawen LIU, Enhui LI

2021, 7(6): 67-77. doi:10.11959/j.issn.2096-0271.2021061

Asbtract ( 280 )

HTML ( 47)

PDF (2181KB) ( 683 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

Locality sensitive hashing is one of the most popular information retrieval methods, which needs to generate long hashing bits to meet the retrieval requirement.However, a long hashing bits requires huge storage space, and contains plenty of redundant hashing bits.In order to solve this problem, ten simple and efficient selection algorithms in feature engineering were adopted to extract the hashing bits which carry the largest amount of information from the long hashing bits which were generated by locality sensitive hashing, and the redundant and useless hash bits were removed.Those ten algorithms tried to capture the performance of each hashing bit or the correlation among bits, such as variance and hamming distance.During selection process, the useless or high-correlated hashing bits were removed.Then the selected hashing bits were compared with the original long hashing bits.The experimental results on four common datasets show that the selected hashing bits works as well as the original hashing bits, and their reduction ratio can reach from 30% to 70%.

A survey of persistent index data structures on non-volatile memory

Yongfeng WANG, Zhiguang CHEN

2021, 7(6): 78-88. doi:10.11959/j.issn.2096-0271.2021062

Asbtract ( 597 )

HTML ( 61)

PDF (1258KB) ( 911 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

With non-volatile memory becoming commercially available, the design and implementation of traditional storage systems need a fundamental change since they can not fully utilize the performance of non-volatile memory.To build a highthroughput, low-latency, large-scale storage system, there is an urgent need for efficient persistent index data structures that adapt to the characteristics of non-volatile memory.In terms of persistent index data structures, the optimizations applied for B+-Tree and Hash Table on non-volatile memory were summarized, and the pros and cons among these schemes were compared.And the future research directions with the challenges and opportunities that need to be resolved were showed.

A review and comparative analysis of domestic and foreign research on big data pricing methods

Nan LIU, Xuejing HAO, Yuhong CHEN

2021, 7(6): 89-102. doi:10.11959/j.issn.2096-0271.2021063

Asbtract ( 1190 )

HTML ( 219)

PDF (1377KB) ( 1199 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

Due to the value characteristics of big data itself, the problem of data pricing is complicated.Although researchers have conducted a lot of research on this, most of them have a single angle and lack a certain practical application.In view of this, the big data pricing methods were reviewed, five types of pricing were sorted out: cost-oriented, market-oriented, demand-oriented, profit-oriented, and life-cycle-based pricing.The advantages and disadvantages of the six mainstream pricing methods were compared: cost method, agreement pricing, market method, income method, quality-based and query-based pricing.Finally, through the analysis of the big data pricing process, the characteristics of the different pricing methods were further revealed, and the data pricing direction was forecasted.The article aims to provide some reference for future related research.

Research on the integration of water environment model and big data technology

Jinfeng MA, Kaifeng RAO, Ruonan LI, Jing ZHANG, Hua ZHENG

2021, 7(6): 103-119. doi:10.11959/j.issn.2096-0271.2021064

Asbtract ( 369 )

HTML ( 58)

PDF (2828KB) ( 546 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

Applications of water environment models are greatly limited by complex internal structure of the model and timeconsuming calculations, significant computation burdens arise during the process of parameter calibration, multi-scenario analysis, and decision-making optimization.How to integrate water environment model and big data technology, deeply explore the potential of model application and give full play to its application value is a research hotspot.The bottlenecks faced by the water environment model in the process of practical application were summarized, and the potential of big data technology in solving these problems was analyzed.Based on the existing big data technology, a framework for the integration of water environment model and big data technology was proposed to solve the problem of large-scale calculation, large-scale storage and application analysis of water environment model.The problems faced in the integration of model and big data technologies were described, and specific technical ways of implementation were proposed.A case study for calibration of SWAT model was used to demonstrate feasibility of the proposed framework.Finally, the future research direction of water environment modeling in the context of big data was discussed, and the conclusion was pointed out that the research on surrogate modeling of complex water environment model and on water environment simulation and optimization framework is the future development trend.

Management control and application of the time-frequency scientific data

Yu ZHANG, Haibo YUAN, Yanping WANG, Shaowu DONG, Jihai ZHANG

2021, 7(6): 120-127. doi:10.11959/j.issn.2096-0271.2021065

Asbtract ( 191 )

HTML ( 25)

PDF (1407KB) ( 711 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

The time-frequency system has become national strategic resource.The application of time-frequency scientific data is related to communications, electricity, transportation, warfare, etc.It has important practical significance for the sorting, management control, and application analysis of time-frequency scientific data.Firstly, the classify and data sharing strategies of time-frequency scientific data were proposed, the detail of the structure about the time-frequency scientific data center management system and the quality control methods of time-frequency scientific data were discussed.Then the problems faced by the open sharing of time-frequency scientific data were analyzed and the solutions were given.Finally, several application directions of time-frequency scientific data were explained, and the management of time-frequency scientific data was summarized and prospected.

Analysis of application barriers for big data in construction field based on ISM

Yingbo JI, Zihao ZHAO, Fuyi2 YAO

2021, 7(6): 128-137. doi:10.11959/j.issn.2096-0271.2021066

Asbtract ( 259 )

HTML ( 32)

PDF (1394KB) ( 484 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

The application degree of big data in the construction field is low, and the promotion is slow.It is of great significance to accurately identify the factors that hinder the application of big data in the construction industry and explore the interaction between the factors.By sorting out the relevant research work, 12 barrier factors were identified.Through using interpretive structure model (ISM), the relationship between factors was determined and transformed into adjacency matrix.Through power iterative analysis, the reachability matrix was established, and the hierarchical relationship of factors was determined.Finally, the influence transmission path between factors was studied and analyzed, and the corresponding countermeasures were given, which provides research support for the application and promotion of big data in China’s construction field.

Value mining and application of big data in enterprise power credit investigation

Baojiang XIN, Dewen LI, Lanlan WANG

2021, 7(6): 138-146. doi:10.11959/j.issn.2096-0271.2021067

Asbtract ( 322 )

HTML ( 71)

PDF (4850KB) ( 306 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

Aiming at the shortcomings of traditional power credit investigation platform such as insufficient stability and low test accuracy, a big data power credit investigation platform was designed.The online analytical method was used to analyze power big data, which was divided into four categories: user behavior, expense rules, user value and personal credit.Based on the modular structure, the data acquisition module, data analysis module and user interaction module were optimized respectively.The KNN and cross validation method were used to classify and process the power consumption data, and the regional power consumption law was obtained, so as to design and adjust the distribution scheme.Finally, the platform was compared with the traditional power credit investigation platform.The experimental results show that the stability and accuracy of the platform are improved, and the accuracy is as high as 98.9% during the test.

当期目录