Big Data Research

Announcement

More...

Topic’s List of Big Data Research 2022-05-19

Special Essay Solicitation for Big Data .. 2022-06-23

Special Essay Solicitation for Big Data .. 2022-03-01

Special Essay Solicitation for Big Data .. 2021-11-16

Editors Recommend

More...

A survey of expressive speech synthesis

Haobin TANG, Xulong ZHANG, Jianzong WANG, Ning CHENG, Jing XIAO

2023 9(6): 53-71 doi:10.11959/j.issn.2096-0271.2022082

Abstract( 358 )

HTML (147)

PDF (3524KB)(454)

Knowledge map

Table and Figures | Reference | Related Articles | Metrics

Research on the internal logic and solution of the “Channel Computing Resources from the East to the West” project

Nannan TONG, Dong CHEN, Huiying LI, Honglin ZHU

2023 9(5): 9-19 doi:10.11959/j.issn.2096-0271.2023055

Abstract( 456 )

HTML (187)

PDF (1659KB)(781)

Knowledge map

Table and Figures | Reference | Related Articles | Metrics

15 May 2024, Volume 10 Issue 3

Previous Issue

TOPIC: GOVERNMENT DATA PROCESSING

Government Data Processing

2024, 10(3): 1-2. doi:10.11959/j.issn.2096-0271.2024004-1

Asbtract ( 96 )

HTML ( 142)

PDF (754KB) ( 180 )

Knowledge map

References | Related Articles | Metrics

Research progress of government data identification technology and the next generation government data identification system

Yun WANG, Yifeng GUO, Xiaoliang SU, Wuai ZHOU, Wanzhe ZHANG, Dahu XU, Qiang ZHOU, Jianhua FENG

2024, 10(3): 3-15. doi:10.11959/j.issn.2096-0271.2024004

Asbtract ( 126 )

HTML ( 63)

PDF (2480KB) ( 151 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

Government data identification is a fundamental work in building a national integrated government big data system.This article reviewed the research progress of data identification technology, compared the similarities and differences in coding rules of different data identification technologies, and further reviewed the progress of government data identification and its applications.Based on the clear rights and responsibilities, high security requirements, and strong compatibility requirements of government data, the next generation government data identification system Gcode was proposed.Gcode consists of three parts: external code, internal code, and security code.Among them, the external code was compatible with the Code for Unified Social Information, the internal code established an association relationship of ＆quot;institution-departmentsystem-data＆quot;, and the security code achieved anti-counterfeiting verification by introducing blockchain technology.Gcode has clear rights and responsibilities, strong compatibility and high security, and can support cross-level, cross-region, crosssystem, cross-department, and cross-business sharing of government data, effectively promoting the implementation of＆quot;one data, one source＆quot; of government data.

Research and practice on key issues in the implementation of government data classification and grading in China

Yue WANG, Na SU

2024, 10(3): 16-26. doi:10.11959/j.issn.2096-0271.2024035

Asbtract ( 134 )

HTML ( 38)

PDF (1592KB) ( 154 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

Data classification and grading is the foundation for ensuring the safe circulation of data and promoting the release of data value.This paper focuses on the key task of government data classification and grading in digital reform.Using a theoretical case study method and based on publicly released plans by various provincial governments and ministries, the implementation of government data classification and grading in China is systematically sorted and quantitatively analyzed.This paper summarizes four key processes and five characteristics of the implementation of government data classification and classification in China.Based on the special complexity of the classification and grading of government data, this paper puts forward four problems corresponding solutions in the implementation of the classification and grading of government data in China, such as unclear overall target positioning, different classification and grading objects, separated classification and grading relations, and different security grading standards.Based on the practice of classification and grading government data of a national ministry, this paper verifies the scientificity and effectiveness of the solutions, and provides a reference for constructing a unified national government data classification and grading system.

Research and enlightenment on the construction mode of provincial government big data platform

Fan MENG, Qunli YANG, Yang GAO, Wenbin LI

2024, 10(3): 27-39. doi:10.11959/j.issn.2096-0271.2024022

Asbtract ( 125 )

HTML ( 78)

PDF (2152KB) ( 180 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

Building a high-quality big data resource platform for the government is an important basic project to realize the integration of government information across departments, regions and levels, accelerate the construction of digital government and improve the digital intelligence level of public services and social governance.Firstly, this paper reviews the development of China's E-government and summarizes the three construction mode and main problems of traditional big data platforms, i.e., low data freshness, poor data consistency, difficult business collaborative management, weak basic support and high overall investment.Secondly, this paper conducts a case study on the credit information resource management platform in Jiangsu and explains the reasons for using Jiangsu case as an example.This paper proposes corresponding solutions and overall architecture design for the problems existing in traditional construction models as well as four reference value of Jiangsu case.Finally, based on summarizing the research and practical experience in Jiangsu, five suggestions are summarized to provide a reference for various provinces to study and formulate policy documents, such as provincial government big data platform construction guidelines.

Research on the application of government big data platform based on federated learning

Jianping WU, Chaochao CHEN, Jiahe JIN, Chunming WU

2024, 10(3): 40-54. doi:10.11959/j.issn.2096-0271.2024032

Asbtract ( 126 )

HTML ( 32)

PDF (2642KB) ( 173 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

At present, the construction of digital government has entered a deepwater area.The government big data platform, as a data base, supports various government information applications.The security and compliance of its private data has been widely concerned by the industry.Federated learning is an important method to effectively solve data silos, and the application of government big data platforms based on federated learning has high research value.Firstly, the current status of government big data platforms and its federated learning application were introduced.Then this paper analyzed three major management challenges involved in the collection, classification and grading and sharing of privacy data on government big data platforms.Further, the problem-solving methods of federated learning based recommendation algorithms and privacy intersection techniques were explored.Finally, summaries and prospects were made for the future application of privacy data on government big data platforms.

"Data empowerment" drives the logic and path of intelligent government construction

Rui WANG, Zhen LIU

2024, 10(3): 55-64. doi:10.11959/j.issn.2096-0271.2024036

Asbtract ( 96 )

HTML ( 55)

PDF (1389KB) ( 140 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

In the era of comprehensive digitalization of society, the trend of government intelligence is irresistible.Under the synergistic effect of technology and data, the construction of digital government in China is showing a trend of digitization, networking, and intelligence.As data is the core resource of digital government construction, building an intelligent digital government inevitably requires fully leveraging the value of data.The logic behind data empowerment in promoting digital government construction is as follows: data empowers the open operation of the government; data empowers the holistic operation of government; data empowers the collaborative operation of government; data empowers the scientific operation of government.The specific path of promoting the construction of digital government through data empowerment requires: ensuring the open sharing of data on the basis of the overall operation of digital government;ensuring the unity and management of data based on the collaborative operation of digital government; ensuring the overall coordination and redistribution of data based on the openness of digital government; ensuring diversity of data sources based on the scientific nature of digital government.

STUDY

A survey of voice conversion based on non-parallel data

Pengcheng LI, Xulong ZHANG, Jianzong WANG, Ning CHENG, Jing XIAO

2024, 10(3): 65-81. doi:10.11959/j.issn.2096-0271.2024011

Asbtract ( 54 )

HTML ( 14)

PDF (2233KB) ( 96 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

Voice conversion is a research topic in the fields of speech and artificial intelligence.The goal of voice conversion is to change the timbre of speech while preserving the content of the source speech, making it sounds like spoken by the target speaker.It is essential to ensure both the quality and naturalness of the converted speech.Voice conversion based on nonparallel data gains much attention currently, where models are trained using non-parallel multilingual speaker datasets, enabling many-to-many and any-to-any voice conversions.This paper provides a comprehensive summary and analysis of recent developments in non-parallel voice conversion.Firstly, we outline the early voice conversion techniques based on parallel corpus and their limitations.Then, we introduce and compare various approaches to voice conversion based on nonparallel data, providing a thorough analysis.Finally, a summary and outlook on voice conversion technology is provided.

Event causality identification network based on knowledge and syntactic structure

Shirui WANG, Bohan XIE, Ling DING, Jianting CHEN, Yang XIANG

2024, 10(3): 82-92. doi:10.11959/j.issn.2096-0271.2024008

Asbtract ( 53 )

HTML ( 23)

PDF (2261KB) ( 122 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

Event causality identification is an important task of relationship extraction, which has received much attention recent years.Most of the existing methods separate syntactic structure from the background knowledge information.The early causality extraction methods focus on the analysis of syntactic structure level.With the development of deep learning, the methods that use the pre-training model combined with background knowledge has become the mainstream.However, neither of the above two kinds of methods fully integrates the sentence information and external knowledge, resulting in different degrees of information loss.To address this problem, we proposed a novel model of event causality identification combining syntactic structure and background knowledge.Our model parses sentences into knowledge syntactic graph structures that contain both syntax and knowledge, and uses the graph convolution network for information fusion.It considers both syntax and knowledge information, which further enriches the event representation and performs effectively.In experiments on the widely-used dataset EventStoryLine, the F1 score of our model achieves 0.445, a 2.3% improvement over existing methods.

Bootstrap sample partition data model and distributed ensemble learning

Kaijing LUO, Yuming ZHANG, Yulin HE, Zhexue HUANG

2024, 10(3): 93-108. doi:10.11959/j.issn.2096-0271.2024002

Asbtract ( 84 )

HTML ( 39)

PDF (2691KB) ( 119 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

A sequential implementation of Bootstrap sampling and Bagging ensemble learning is computationally inefficient and not scalable to build large Bagging ensemble models with a large number of component models.Inspired by distributed big data computing, a new Bootstrap sample partition (BSP) big data model and a distributed ensemble learning method for large-scale distributed ensemble learning were proposed.The BSP data model extended a dataset as a set of Bootstrap samples stored in Hadoop distributed file system.Our distributed ensemble learning method randomly selected a subset of samples from the BSP data model and read them into Java virtual machines of the cluster.Following this, a serial algorithm was executed in each virtual machine to process each sample data and build a machine learningmodel on each sample data independently and in parallel with other virtual machines.Eventually, allsub-results were collected and processed in the master node to produce the ensemble result, optionally adding a sample preferences trategy for the BSP data blocks.The BSP data model generation and the component model building were computed using a non-MapReduce computing paradigm.All component models were computed in parallel without data communication among the nodes.The algorithms proposed in this paper were implemented in spark as internal operators that can be utilized in Spark applications.Experiments have demonstrated that BSP data model of a dataset can be generated efficiently through the new distributed algorithm.It improves the reusability of data samples and increases computational efficiency by over 50% in large-scale Bagging ensemble learning, while also increasing prediction accuracy by approximately 2%.

Deep reinforcement learning news recommendation based on dynamic action coverage

Xianghong DONG, Junxiu AN

2024, 10(3): 109-118. doi:10.11959/j.issn.2096-0271.2023069

Asbtract ( 147 )

HTML ( 10)

PDF (1843KB) ( 344 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

News recommendation system plays an important role in news dissemination of new media.This paper proposed a recommendation system based on deep reinforcement learning, which aimed to combine the representation ability of neural network and the strategy selection ability of reinforcement learning to improve the effect of news recommendation.This paper used dynamic action masks to enhance the ability of judging the short-term interests of users, used the optimization cache mechanism to improve the efficiency of experience cache use, and accelerated model training through the reward design of regional masking nature to improve the performance of the recommendation system in the field of news recommendation.Experimental results show that the accuracy of the proposed model in news data sets is comparable to the current mainstream neural network recommendation methods,and its ranking performance is better than others.

Multi-teacher distillation BERT model in NLU tasks

Jialai SHI, Weibin GUO

2024, 10(3): 119-132. doi:10.11959/j.issn.2096-0271.2023039

Asbtract ( 47 )

HTML ( 13)

PDF (1961KB) ( 73 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

Knowledge distillation is a model compression scheme commonly used to solve the problems of large scale and slow inference of BERT constant depth pre-training model.The method of ＆quot;multi-teacher distillation＆quot; can further improve the performance of the student model, while the traditional ＆quot;one-to-one＆quot; mapping method mandatory assignment strategy for the middle layer of the teacher model will lead to the abandonment of most of the middle features.The ＆quot;one-tomany＆quot; mapping method is proposed to solve the problem that the middle layer cannot be aligned during knowledge distillation, and help students master the grammar, reference and other knowledge in the middle layer of the teacher model.Experiments on several data sets in GLUE show that the student model retains 93.9% of the average inference accuracy of the teacher model, while only accounting for 41.5% of the average parameter size of the teacher model.

Spectral clustering ensemble algorithm based on three-order tensor for large-scale data

Yunzheng WU, Tao DU, Jin ZHOU, Di CHEN, Xingeng WANG

2024, 10(3): 133-148. doi:10.11959/j.issn.2096-0271.2024007

Asbtract ( 44 )

HTML ( 10)

PDF (20567KB) ( 86 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

In order to reduce the computational burden of large-scale data spectral clustering and further improve the clustering accuracy and robustness, the spectral clustering ensemble algorithm based on the three-order tensor for large-scale data was proposed.The sparse affinity sub-matrix was first constructed by the mixed representative nearest neighbor approximation method.The sparse affinity sub-matrix was then represented as a bipartite graph.The preliminary clustering results were obtained by Graph Segmentation.Finally, an unified clustering result was obtained by fusing multiple clustering results through the three-order tensor ensemble method.On the real datasets and the synthetic datasets, the proposed algorithm showed a better performance compared to the classical spectral clustering algorithm, the clustering ensemble algorithm, and the improved algorithms in recent years.

FORUM

Research on the national defense cyber security and data governance

Pengyun QI

2024, 10(3): 149-162. doi:10.11959/j.issn.2096-0271.2023038

Asbtract ( 109 )

HTML ( 40)

PDF (1718KB) ( 121 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

Improving and perfecting China's national defense cyber and data security governance structure is not only an important part of domestic national network and data security governance, but also a key practice in subdivided data fields under the framework of the “Data Security Law”.Through comparative analysis and literature analysis, we can extract the logical characteristics from the 2013-2022 U.S.National Defense Authorization Act, the successful experience of which can be valuable for improving domestic defense cyber and data security frame.Guiding by the holistic approach to national security with the core elements of traditional and non-traditional security construction, we can improve special legislation on national defense network and data security, and a “double interaction” system of government-civilian early warning interaction awareness and government-enterprise cooperation interaction layout to improve the strategic pattern of China's defense network security and data governance.

EXPERT VIEW

On public data

2024, 10(3): 163-167. doi:10.11959/j.issn.2096-0271.2024037

Asbtract ( 47 )

HTML ( 19)

PDF (1176KB) ( 105 )

Knowledge map

References | Related Articles | Metrics

COLUMN: LOCAL GOVERNMENT BIG DATA

Shandong’s exploration and construction on the application of data innovation

2024, 10(3): 168-174. doi:10.11959/j.issn.2096-0271.2023023

Asbtract ( 66 )

HTML ( 45)

PDF (1940KB) ( 102 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

An Elementarisation Method for Public Data Based on Urban Knowledge Systems

ZHENG Yu, YI Xiu wen, Ql De kang, PAN Zhe yi

10.11959/j.issn.2096-0271.2024042

Online First: 2024-06-04

Multi-teacher distillation BERT model in NLU tasks

SHI Jialai, GUO Weibin

10.11959/j.issn.2096-0271. 2023039

Online First: 2023-05-05

An Efficient and Robust Multi- scenario Artificial Intelligent Medical Model based on Metaverse

ZHU Jiuwen, ZHOU Yubing, Si Hongbiao, ZHANG Xulong, Xu Liang

doi: 10.11959/j.issn.2096-0271.2023006

Online First: 2023-02-14

2024 Vol.10	No.3	No.2	No.1
2023 Vol.9	No.6	No.5	No.4	No.3	No.2	No.1
2022 Vol.8	No.6	No.5	No.4	No.3	No.2	No.1
2021 Vol.7	No.6	No.5	No.4	No.3	No.2	No.1
2020 Vol.6	No.6	No.5	No.4	No.3	No.2	No.1
2019 Vol.5	No.6	No.5	No.4	No.3	No.2	No.1
2018 Vol.4	No.6	No.5	No.4	No.3	No.2	No.1
2017 Vol.3	No.6	No.5	No.4	No.3	No.2	No.1
2016 Vol.2	No.6	No.5	No.4	No.3	No.2	No.1
2015 Vol.1	No.4	No.2	No.3	No.1

Special Essay Solicitation for Big Data Research: Metaverse Big Data

Digital economics in metaverse: state-of-the-art, characteristics, and vision

Good Book Recommendation