Big Data Research

An initial exploration on framework of data assetization

Yazhen YE, Guohua LIU, Yangyong ZHU

2020, 6(3): 3-12. doi:10.11959/j.issn.2096-0271.2020019

Asbtract ( 1506 )

HTML ( 404)

PDF (1076KB) ( 1565 )

Knowledge map

Figures and Tables | References | Supplementary Material | Related Articles | Metrics

As the digital economy develops,data as one of the key elements of the digital economy has been widely recognized as a new type of assets.However,various types of data cannot be treated as assets.Therefore,the criteria of data assets and the transformation from datasets to data assets are critical questions for data industry and the actors involved in data economy to solve.The features and requirements of data assets were discussed,and a basic framework for data assetization based on features of data assets was proposed,including five phases of registration of rights concerning data resources,data value confirmation and quality control,data box building and storage,asset pricing and evaluation,and data asset depreciation and appreciation management.A plausible path to data resource assetization was provided.

Data assets value evaluation model based on profit maximization

Xiangqian DONG, Bing GUO, Yan SHEN, Xuliang DUAN, Yuncheng SHEN, Hong ZHANG

2020, 6(3): 13-20. doi:10.11959/j.issn.2096-0271.2020020

Asbtract ( 750 )

HTML ( 161)

PDF (723KB) ( 1077 )

Knowledge map

Figures and Tables | References | Supplementary Material | Related Articles | Metrics

The idea that data is valuable and will become an economic commodity has become a consensus.However,the nonrivalry of data makes that the value of data is different from tangible assets.A correct understanding of the value of data is the premise and guarantee for the realization of data sharing and exchange and for the development of digital economy.Firstly,the evaluation method of data value and the commodity attribute of data assets were analyzed.Then the market model of data assets transaction was discussed.Finally,a profit model of participants was proposed based on the comprehensive reviews of data asset attributes and market model.The profit of participants in data market was modeled,and a guide for participants entering the market was provided.

Blockchain based data marketplace

Jingwei WANG, Zhenzhe ZHENG, Fan WU, Guihai CHEN

2020, 6(3): 21-35. doi:10.11959/j.issn.2096-0271.2020021

Asbtract ( 1074 )

HTML ( 256)

PDF (1326KB) ( 1317 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

Blockchain is a kind of decentralized distributed data storage technology.Blockchain can solve the disadvantages of the centralized data market,however,distributed data marketplaces also introduce the problems of security and privacy.Firstly,the current status from industry and progress from academia of the bid data marketplaces were reviewed,and then the required properties that a well developed blockchain based data marketplaces should satisfy were proposed.Based on these properties,a blockchain based data marketplace framework was proposed.Then the potential security and privacy problems in this framework were investigated,and the corresponding solutions for these problems were designed.Based on this framework,a data marketplace demonstration system was implemented and its feasibility and security were verified.

Research status quo and suggestions on data assets standardization

Bingrong DAI, Shanshan BI, Lin YANG, Tingting JI, Mei CHEN

2020, 6(3): 36-44. doi:10.11959/j.issn.2096-0271.2020022

Asbtract ( 844 )

HTML ( 207)

PDF (1181KB) ( 1461 )

Knowledge map

Figures and Tables | References | Supplementary Material | Related Articles | Metrics

Data is considered to be the most valuable asset of various organizations,and the research on data asset has been valued by various countries,industries and organizations.From the perspective of standardization,the theoretical research and practice of data assets of relevant organizations at home and abroad was introduced,as well as the progress of standardization research on data assets.The standardization idea of data assets and the basic requirements of data assets management were put forward,which provided reference for the management and application of data assets.

Construction of a value-oriented realization of data asset management system

Yufei LI, Haiyan LIU, Shu YAN

2020, 6(3): 45-56. doi:10.11959/j.issn.2096-0271.2020023

Asbtract ( 727 )

HTML ( 191)

PDF (1202KB) ( 1313 )

Knowledge map

Figures and Tables | References | Supplementary Material | Related Articles | Metrics

In the era of the digital economy,data is increasingly becoming an important strategic asset for enterprises,but the lack of data asset management capabilities is increasingly becoming a key issue that restricts companies’ ability to add value to their data.By sorting out the evolution history of data asset management,the related concepts of data asset management was clarified,the current status of the industry of data asset management was analyzed,the design ideas and main contents of a data asset management system oriented to value realization were explained,and a complete data asset management system was shown.The important role of data operations was emphasized,a practical path for the data asset management system was established,the development trend of data asset management was summarized.

A survey of dataflow programming models and tools for big data processing

Xiaofeng ZOU, Wangdong YANG, Xuecheng RONG, Kenli LI, Keqin LI

2020, 6(3): 59-72. doi:10.11959/j.issn.2096-0271.2020024

Asbtract ( 544 )

HTML ( 134)

PDF (1510KB) ( 902 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

The application of big data and artificial intelligence is promoted by data mining and intelligent analysis of a large number of static data using big data computing platform.In the face of the growing demand for real-time dynamic data processing generated by the Internet of things,dataflow computing has been gradually introduced into some big data processing platforms.Aiming at the programming model of data flow,the traditional software engineering design method for dataflow analysis and the structure definition and model reference provided by the current dataflow programming model for big data processing platform was compared,the differences and shortcomings were analyzed,and the main features and key elements of the dataflow programming model were summarized.The main methods of dataflow programming and the combination with the mainstream programming tools were analyzed,and the basic framework and programming mode of visual dataflow programming tools were presented according to the dataflow computing business requirements of big data processing.

Dataflow model and its applications in big data processing

Nifei BI, Guangyao DING, Qihang CHEN, Chen XU, Aoying ZHOU

2020, 6(3): 73-86. doi:10.11959/j.issn.2096-0271.2020025

Asbtract ( 412 )

HTML ( 119)

PDF (1543KB) ( 506 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

Unbounded,unordered and large scale datasets are increasingly common in recent years.Meanwhile,the processing requirements from data consumers are becoming more and more sophisticated,such as event time,window and latency.In order to deal with the evolved processing requirements on these unbounded,unordered and large scale datasets,the dataflow model in big data processing was introduced.On one hand,the dataflow graph of the dataflow model in big data processing was analyzed from the level of execution engine.On other hand,the dataflow programming model of the dataflow model in big data processing was analyzed from the level of unified programming.Furthermore,the different implementations of dataflow graph and dataflow programming model in multiple execution engines were analyzed,including Spark,a batch processing engine,and Flink,a stream processing engine.

State-of-art research of cluster resource management in dataflow computing model

Xiaochun TANG, Ying FU, Zhao DING, Anqi MAO, Zhanhuai LI

2020, 6(3): 87-100. doi:10.11959/j.issn.2096-0271.2020026

Asbtract ( 230 )

HTML ( 45)

PDF (1513KB) ( 653 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

The development of cluster-based high-performance computing has undergone three stages of evolution.With the widespread use of dataflow programming models such as Spark and Flink in the field of big data computing,how to ensure the fair share with the cluster resources by various dataflow computing applications is extremely important.It is also a main means to reduce the cost of infrastructures.As the drawbacks of traditional cluster resource management have becoming increasingly apparent in dataflow computing model,many alternative cluster resource management,including HoD,centralized scheduling,two-level scheduling,distributed scheduling,and hybrid scheduling management,have been proposed in recent years.Their respective advantages and disadvantages were introduced,and a certain reference for the uses or researches in development of cluster resource management and scheduling in a dataflow computing environment was provided.

Survey on data caching technology of distributed dataflow system

Xuchu YUAN, Guo FU, Jize BI, Yanfeng ZHANG, Tiezheng NIE, Yu GU, Yubin BAO, Ge YU

2020, 6(3): 101-116. doi:10.11959/j.issn.2096-0271.2020027

Asbtract ( 423 )

HTML ( 66)

PDF (1335KB) ( 717 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

Dataflow model is adopted by several dataflow systems for its advantages of high parallel computing,pipeline processing and functional programming.In distributed dataflow systems and heterogeneous dataflow systems,due to the speed mismatch between the data production of data source operators and the data consumption of data sink operators,data could be delayed and operators could be idle.In order to support an efficient dataflow system,a dataflow cache system was desired to ensure efficient caching and movement of dataflow.Several distributed dataflow systems and distributed message queuing systems were analyzed,and the support degree of current message queuing system to data flow caching system was summarized.Finally,the cache technique was introduced,and the demands and research directions of future dataflow caching systems were analyzed.

The usage of dataflow model in GPU and big data processing

Huayou SU, Songzhu MEI, Rongchun LI, Yong DOU

2020, 6(3): 117-128. doi:10.11959/j.issn.2096-0271.2020028

Asbtract ( 352 )

HTML ( 58)

PDF (1332KB) ( 626 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

Dataflow model is an efficient computing model.It has been widely used in software and hardware fields due to its natural advantages in parallelism.In terms of hardware architecture,the dataflow model leads the computer architecture to the direction of supporting higher concurrency from the traditional von Neumann architecture.The stream processor based on the long vector processing unit and the SIMT GPU are two instances of using dataflow technology.In terms of programming models,dataflow ideas have been widely used in the field of big data programming models,such as MapReduce and Spark.The architecture of NVIDIA GPU and CUDA programming model were analyzed from the perspective of dataflow model.The applying and trend of dataflow and GPU were analyzed in big data processing,and ideas and methods were provided for applying GPU-based systems to the field of big data processing.

当期目录