Big Data Research

Select

Big data technologies forward-looking

Hong MEI, Xiaoyong DU, Hai JIN, Xueqi CHENG, Yunpeng CHAI, Xuanhua SHI, Xiaolong JIN, Yasha WANG, Chi LIU

Big Data Research 2023, 9 (1): 1-20. DOI: 10.11959/j.issn.2096-0271.2023009

Abstract （2610）

HTML （970）

PDF（pc）（1087KB）（1541）

Save

Major countries in the world attach great importance to the development of big data technology.China also puts big data as a national strategy, of great significance to develop in the long run.Big data technologies include data collection, transmission, management, processing, analysis, and application, forming a data life cycle as well as the data governance related to each procedure.Big data management, processing, analysis, and governance in four areas were seleceted, to identify the gap between China and the world.On the other hand, driven by diverse successful big data applications, the system architecture of computing technology is being restructured.From “computation-centric” to “data-centric”, fundamental computing theories and core technologies need to be redesigned, therefore a new type of big data system technology is becoming an important research direction.Against this background, four technical challenges and ten future development trends of big data technologies were aimed at identifying.

Reference | Related Articles | Metrics

Select

Threats and defenses of federated learning: a survey

Jianhan WU, Shijing SI, Jianzong WANG, Jing XIAO

Big Data Research 2022, 8 (5): 12-32. DOI: 10.11959/j.issn.2096-0271.2022038

Abstract （1760）

HTML （256）

PDF（pc）（2537KB）（1948）

Save

With the comprehensive application of machine learning technology, data security problems occur from time to time, and people’s demand for privacy protection is emerging, which undoubtedly reduces the possibility of data sharing between different entities, making it difficult to make full use of data and giving rise to data islands.Federated learning (FL), as an effective method to solve the problem of data islands, is essentially distributed machine learning.Its biggest characteristic is to save user data locally so that the models’ joint training process won’t leak sensitive data of partners.Nevertheless, there are still many security risks in federated learning in reality, which need to be further studied.The possible attack means and corresponding defense measures were investigated in federal learning comprehensively and systematically.Firstly, the possible attacks and threats were classified according to the training stages of federal learning, common attack methods of each category were enumerated, and the attack principle of corresponding attacks was introduced.Then the specific defense measures against these attacks and threats were summarized along with the principle analysis, to provide a detailed reference for the researchers who first contact this field.Finally, the future work in this research area was highlighted, and several areas that need to be focused on were pointed out to help improve the security of federal learning.

Table and Figures | Reference | Related Articles | Metrics

Select

Features and transaction modes of data products in data markets

Lihua HUANG, Yifan DOU, Mengke GUO, Qifeng TANG, Gen LI

Big Data Research 2022, 8 (3): 3-14. DOI: 10.11959/j.issn.2096-0271.2022045

Abstract （1296）

HTML （284）

PDF（pc）（1700KB）（1654）

Save

Developing the markets of data as a factor of production is the key in the efficient allocation of data factor.However, the early practices of data markets in China have revealed a series of problems, which urgently calls for a systematic review and analysis on the data market theoretical mechanisms.The circulation process of data products was analyzed from different perspectives, such as transaction cost theory, electronic market framework, and electronic trading mode.And it was further proposed that the effects of the data computability were two-fold.On the one hand, the computability enabled data to be analyzed so as to fit in the specific demand in certain industries.On the other hand, the computability was also likely to remove the data transaction process from the market, also known as platform disintermediation.Based on the classical theoretical framework of electronic market, the offerings of data products were divided into four quadrants and analysis was conducted correspondingly.Finally, suggestions for data product suppliers and data transaction platform providers were put forward.

Table and Figures | Reference | Related Articles | Metrics

Select

Survey on federated recommendation systems

Zhitao ZHU, Shijing SI, Jianzong WANG, Jing XIAO

Big Data Research 2022, 8 (4): 105-132. DOI: 10.11959/j.issn.2096-0271.2022032

Abstract （1087）

HTML （137）

PDF（pc）（2663KB）（1164）

Save

In the federated learning (FL) paradigm, the original data are stored in independent clients while masked data are sent to a central server to be aggregated, which proposes a novel design approach to numerous domains.Given the wide application of recommendation systems (RS) in diverse domains, combining RS with FL techniques has been gaining momentum to reduce the computational cost, do cross-domain recommendation and protect users’ privacy while maintaining recommendations performance as traditional RS.The federated learning-based recommendation systems in recent years were comprehensively summarized.The difference between traditional and federated recommendation systems was analyzed, and the main research direction and progress of federated recommendation systems were demonstrated with comparison and analysis.Firstly, the traditional recommendation systems and their bottleneck were summarized.Then the federated learning paradigm was introduced.Furthermore, the advantages of combining federated learning with recommendation systems were depicted in two aspects: privacy protection and usage of multi-domain user information, along with the technical challenges during the combination.At the same time, the existing deployment of federated recommendation systems was illustrated in detail.Finally, future research on federated recommendation systems was prospected and summarized.

Table and Figures | Reference | Related Articles | Metrics

Select

Value chain model of data governance and its application on data governance regulation analysis

Keman HUANG, Xiaoyong DU

Big Data Research 2022, 8 (4): 3-16. DOI: 10.11959/j.issn.2096-0271.2022062

Abstract （918）

HTML （351）

PDF（pc）（1444KB）（972）

Save

Cultivating the data marketplace is an important mechanism to achieve the value of big data.The prosperity of such a data marketplace needs a sustainable and healthy data service ecosystem.A data governance value chain model was developed to identify the primary and support activities for data value release.Then the data service ecosystem model was implemented accordingly to distinguish different stakeholders and their core functions that a data marketplace should have.Using the developed data governance value chain model and data service ecosystem model, the data dovernance regulation was analyzed systematically, aiming at providing suggestions to promote the growth of the data marketplace.

Table and Figures | Reference | Related Articles | Metrics

Select

Digital economics in metaverse: state-of-the-art, characteristics, and vision

Chenhuizi WANG, Wei CAI

Big Data Research 2022, 8 (3): 140-150. DOI: 10.11959/j.issn.2096-0271.2022048

Abstract （840）

HTML （166）

PDF（pc）（1379KB）（1072）

Save

Metaverse has become a very popular technology buzzword at the end of 2021, since Facebook changed its name to Meta, indicating their long-term commitment tometaverse.Firstly, the technical development process to expound on the inevitability and necessity of metaverse was reviewd.Afterward, the risks and challenges of the decentralized digital economy were revealed, through the analysis of the overseas metaverse digital economy.Lastly, it was pointed out that the key spiritual core of decentralization lies in the global anti-monopoly ideology, and the future of the domestic metaverse industry was envisioned.

Table and Figures | Reference | Supplementary Material | Related Articles | Metrics

Select

Developing Data Factor Market

Big Data Research 2022, 8 (3): 1-2. DOI: 10.11959/j.issn.2096-0271.2022045-1

Abstract （803）

HTML （222）

PDF（pc）（888KB）（467）

Save

Reference | Related Articles | Metrics

Select

Data-Commerce-Ecosystem: data goods, data businessman and data commerce

Yazhen YE, Yangyong ZHU

Big Data Research 2023, 9 (1): 111-125. DOI: 10.11959/j.issn.2096-0271.2023003

Abstract （786）

HTML （160）

PDF（pc）（1288KB）（583）

Save

With the progress in the development of the data factor market, the concept of Data-Commerce-Ecosystem (DCE) has attracted wide attention.However, there has been little discussion on the connotation of DCE as well as its role and responsibilities in the modern-day economy, which hinders the formation of a data trade ecosystem.Possible categories of contemporary data goods, data businessmen, and data commerce, the proposed definitions of said concepts were discussed.Information goods, digital goods, and data goods were incorporated into the concept of data goods.Data businessmen were categorized into three groups based on their different commerce models, which were data suppliers, data service providers, and data commodity traders.Several DCE models were summarized, which were the self-produceand-market model, operation platform agent model, and data marketplace model.These discussions enrich the connotation of DCE and in turn provide theoretical support for the development of the data factor market.

Table and Figures | Reference | Related Articles | Metrics

Select

Rhythm dancer: 3D dance generation by keymotion transition graph and pose-interpolation network

Yayun HE, Junqing PENG, Jianzong WANG, Jing XIAO

Big Data Research 2023, 9 (1): 23-37. DOI: 10.11959/j.issn.2096-0271.2023004

Abstract （756）

HTML （89）

PDF（pc）（3750KB）（414）

Save

3D dance is an indispensable form of virtual humans in the metaverse.It organically combines music and dance art, which greatly increases the interest in the metaverse.Previous work usually treats it as a simple sequence generation task, but it is difficult to match the dance movements with the music beat perfectly and the quality of long sequence dance generation is difficult to be guaranteed.Inspired by the process by which humans learn to dance, a novel 3D dance framework “Rhythm Dancer”to solve the above problems was proposed.The framework first uses VQ-VAE-2 to encode and quantify the dances in a hierarchical way, which effectively improves the quality of dance generation.Then, a key movement transition map was created using the core dance movements on the rhythm points, which not only ensures that the generated dance movements fit with the music beat, but also increases the diversity of dance movements.To ensure smooth and natural connections between the core dance moves, a poseinterpolation network was proposed to learn the transition movements between key moves.Extensive experiments demonstrate that the framework not only avoids the instability and uncontrollability problems of long sequence generation, but also achieves a higher match between dance movements and music rhythms, reaching state-of-the-art results.

Table and Figures | Reference | Related Articles | Metrics

Select

Authenticating and licensing architecture of data rights in data trade

Qifeng TANG, Zhiqing SHAO, Yazhen YE

Big Data Research 2022, 8 (3): 40-53. DOI: 10.11959/j.issn.2096-0271.2022029

Abstract （704）

HTML （113）

PDF（pc）（1417KB）（628）

Save

Data is a key factor of production in digital economy and establishing a factor market of data is inevitable.The development of data factor market includes efforts in the fields of the authentication of data rights, object of transaction, pricing mechanics, exchange platform, trade regulation and so on.The rights and authentication process necessary for a data product or data service to be traded in a data exchange were explored systematically.The form of transaction object in data trade was designed as “data product/service + a right”.A variety of licenses for different forms of data products and data services were further designed, and a licensing system supporting the exchange of data was formed.

Table and Figures | Reference | Related Articles | Metrics

Select

Enlightenment of open access to public data in the European Union

Qun ZHANG, Zhuo YIN, Hao YU, Weizhong WANG, Xiaojie JIA

Big Data Research 2022, 8 (6): 143-152. DOI: 10.11959/j.issn.2096-0271.2022047

Abstract （687）

HTML （59）

PDF（pc）（1264KB）（626）

Save

Open access to public data contributes to the high-quality development of the digital economy.In the early stage, China actively introduced relevant policies to guide the openness and utilization of public data, and many local regulations issued relevant local rules and regulations.But the national level has not yet issued relevant rules and regulations for the openness and utilization of public data.Compared with our country, the European Union is continuously issuing and revising directives related to open access to public data, to promote technological innovation in the field of the digital economy.The relevant practices of open access to public data in China were sorted out, and the main directions and characteristics of the EU’s open data and public sector information reuse directives were analyzed.Combing with China’s situation, the relevant enlightenments and suggestions on the open access to public data in China were put forward.Hope that it will be useful to further improve the open access policies, regulations, and mechanism of public data, promote the deep sharing and orderly opening of public data in China.

Reference | Related Articles | Metrics

Select

Research on the development path and countermeasures of data element value

Yunlong YANG, Liang ZHANG, Xulei YANG

Big Data Research 2023, 9 (6): 100-109. DOI: 10.11959/j.issn.2096-0271.2022080

Abstract （673）

HTML （106）

PDF（pc）（2022KB）（862）

Save

Based on the development of data element marketization at home and abroad, the development path and characteristics of data element value in foreign countries were expounded.The current situation of China's data element market in terms of transaction market and application scenarios was summarized.In view of the current development of China's data element market, combined with China's data element market environment and development characteristics, through the construction of a data element market model with Chinese characteristics, we can speed up the release of data element value.

Table and Figures | Reference | Related Articles | Metrics

Select

Overview of observational data-based time series causal inference

Zefan ZENG, Siya CHEN, Xi LONG, Guang JIN

Big Data Research 2023, 9 (4): 139-158. DOI: 10.11959/j.issn.2096-0271.2022059

Abstract （659）

HTML （67）

PDF（pc）（2614KB）（1482）

Save

With the increase of data storage and the improvement of computing power,using observational data to infer time series causality has become a novel approach.Based on the properties and research status of time series causal inference,five observational data-based methods were induced,including Granger causal analysis,information theory-based method,causal network structure learning algorithm,structural causal model-based method and method based on nonlinear state-space model.Then we briefly introduced typical applications in economics and finance,medical science and biology,earth system science and other engineering fields.Further,we compared the advantages and disadvantages and analyzed the ways for improvement of the five methods according to the focus and difficulties of time series causal inference.Finally,we looked into the future research directions.

Table and Figures | Reference | Related Articles | Metrics

Select

Metaverse and big data: data insight and value connection in spatio-temporal intelligence

Yang SHEN, Menglong YU

Big Data Research 2023, 9 (1): 103-110. DOI: 10.11959/j.issn.2096-0271.2023012

Abstract （631）

HTML （210）

PDF（pc）（1402KB）（481）

Save

Metaverse realizes the simulation and feedback of the virtual world to the physical world by intellectualizing the data of space-time nodes.Big data is the core means to enhance human insight into the world.Starting from the concept deduction and definition logic of the metaverse, this study sorted out four different levels of conceptual dimensions of the metaverse and proposed a five-level data association model based on the nine-point thinking of big data insight.From the establishment of the metaverse system to the connection of multiple metaverse systems, this study explored the data generation, data collection, data analysis, and data value mining in the metaverse.This study analyzes the connection of space data, time data, and international data in the metaverse, and expects to better understand, describe and transform the world by studying the data insight and value connection in the metaverse.

Table and Figures | Reference | Related Articles | Metrics

Select

Exchange mechanism for decentralized finance: a survey

Yimin DENG, Shijing SI, Jianzong WANG, Zeyuan LI, Jing XIAO

Big Data Research 2022, 8 (4): 67-84. DOI: 10.11959/j.issn.2096-0271.2022064

Abstract （620）

HTML （86）

PDF（pc）（2925KB）（774）

Save

Decentralized finance (DeFi) is a new paradigm for providing financial services based on blockchain and smart contract, which support many applications including loans and derivatives.Therefore, the exchange mechanism of DeFi has attracted large amount of attention, as it directly affects the stability of upper-level applications.The exchange mechanism of DeFi was reviewed.Firstly, the concept and the protocols related to exchange mechanism were introduced.Secondly, the transaction mechanism was classified through their approaches of realization, and the methods based on order book, automated market maker and aggregator were discussed respectively.The differences and connections among the implementation of those methods were introduced.Finally, the fairness, security and anonymity problems faced by the decentralized exchange were analyzed and summarized, and potential future research directions were proposed.

Table and Figures | Reference | Related Articles | Metrics

Select

BoxedData: a data product form based on databox

Yazhen YE, Yangyong ZHU

Big Data Research 2022, 8 (3): 15-25. DOI: 10.11959/j.issn.2096-0271.2022030

Abstract （610）

HTML （145）

PDF（pc）（1646KB）（419）

Save

Same as those usual product markets, data products circulating in a data market can be categorized into standard products and non-standard products.Currently, standardized data products such as music, images and video clips are effectively circulating in the market, while large-scale big data product in a broad sense is facing numerous circulation obstacles.One such obstacle is the measurement and valuation of data products, which requires a measurable standard data product form to solve.On the basis of the data box model, BoxedData was proposed, as a standard form of data products.A data product that adopts the form of BoxedData consists of two parts, which are inbox data and packing materials.Inbox data refers to the cubic data structure with three dimensions of time, space and content, which generally includes images, shapes, videos, sound, text, structured data and other types of data.Packing materials includes product registration certificate, product instructions, product quality certificate, product compliance certificate and other documents.BoxedData aims to provide data factor market with a standard data product form which is measurable and evaluable.

Table and Figures | Reference | Related Articles | Metrics

Select

Data Circulation and Privacy Computing

Big Data Research 2022, 8 (5): 1-2. DOI: 10.11959/j.issn.2096-0271.2022071-1

Abstract （570）

HTML （268）

PDF（pc）（741KB）（481）

Save

Reference | Related Articles | Metrics

Select

Data tenancy: a new paradigm for data circulation

Wenqiang RUAN, Mingxin XU, Xinyu TU, Lushan SONG, Weili HAN

Big Data Research 2022, 8 (5): 3-11. DOI: 10.11959/j.issn.2096-0271.2022071

Abstract （534）

HTML （140）

PDF（pc）（1582KB）（500）

Save

Data is becoming a new type of factor of production.How to compliantly and audibly circulate data among multiple parties is very important for data value formation.A novel data circulation paradigm, namely data tenancy, was proposed from the perspective of privacy preservation and data utilization.The motivation of data tenancy was discussed, and five requirements that data tenancy should satisfy were identified.Finally, a secret sharing-based data tenancy technique was proposed.

Table and Figures | Reference | Related Articles | Metrics

Select

Digital Economy

Big Data Research 2022, 8 (4): 1-2. DOI: 10.11959/j.issn.2096-0271.2022062-1

Abstract （517）

HTML （237）

PDF（pc）（858KB）（593）

Save

Reference | Related Articles | Metrics

Select

Exploration and practice of data quality governance in privacy computing scenarios

Yan ZHANG, Yifan YANG, Ren YI, Shengmei LUO, Jianfei TANG, Zhengxun XIA

Big Data Research 2022, 8 (5): 55-73. DOI: 10.11959/j.issn.2096-0271.2022073

Abstract （510）

HTML （128）

PDF（pc）（2649KB）（376）

Save

Privacy computing is a new data processing technology, which can realize the transformation and circulation of a data value on the premise of protecting data privacy and security.However, the invisible feature of data in private computing scenarios poses a great challenge to traditional data quality management.There is still a lack of perfect solutions.To solve the above problems in the industry, a data quality governance method and process suitable for privacy computing scenarios were proposed.A local and multi-party data quality evaluation system was constructed, which could take into account the data quality governance of the local domain and the federal domain.At the same time, a data contribution measurement method was proposed to explore the long-term incentive mechanism of privacy computing, improve the data quality of privacy computing, and improve the accuracy of computing results.

Table and Figures | Reference | Related Articles | Metrics

Select

Research and practice on regional big data industry planning: a case study by taking the big data development and application planning of Shanxi Province as an example

Big Data Research 2022, 8 (4): 165-172. DOI: 10.11959/j.issn.2096-0271.2022037

Abstract （507）

HTML （113）

PDF（pc）（1025KB）（337）

Save

Reference | Related Articles | Metrics

Select

From data quality to data products quality

Li CAI, Yangyong ZHU

Big Data Research 2022, 8 (3): 26-39. DOI: 10.11959/j.issn.2096-0271.2022040

Abstract （503）

HTML （125）

PDF（pc）（1723KB）（648）

Save

For a long time, the purpose of data quality research is to fulfill requirements of the normal operation of the organization’s own information system.With the construction and development of data market, the requirements on data quality have changed from “self-use” to “other use” and “need for supervision”.The data products quality in the data market is the focus of data users (buyers) and market regulators.The demands of users and regulators for data product quality were analyzed, and a framework of data product quality was proposed innovatively.On this basis, taking BoxedData as an example, the corresponding quality dimensions, quality indicators and quality assessment models were construct from three aspects of time, space and content integrity.The quality framework was suitable for detecting and assessing resource data products, and could provide effective detection methods and standards for data product buyers and market regulators.

Table and Figures | Reference | Related Articles | Metrics

Select

A hot-update-aware optimization to the query of LSM-Tree

Qingyin LIN, Zhiguang CHEN

Big Data Research 2023, 9 (1): 126-140. DOI: 10.11959/j.issn.2096-0271.2022049

Abstract （493）

HTML （56）

PDF（pc）（5496KB）（396）

Save

Key-value stores based on LSM-Tree have been widely used.LSM-Tree gains excellent write performance by collecting updated data in memory and then flushing data into storage in batches.However, in LSMTree-based key-value stores, old data generated by update operations will not be eliminated immediately from the storage system, resulting in a large amount of invalid data accumulated in the entire storage system, which will eventually significantly reduce the read performance of key-value stores.For the above problems, an active compaction method was proposed.By recording the history information of updated key-value pairs, recognizing hot-updated keys, finding SSTables that contain a large amount of invalid data in the storage system, and triggering compaction as soon as possible to clear much more invalid data, the proposed method could reduce write amplification and improve the read performance of LSM-Tree based key-value stores.Experiments showed that this method could reduce the average read latency of LevelDB by 65.2%, 99% read tail latency by 69.4%, and write amplification by 71.4%.

Table and Figures | Reference | Related Articles | Metrics

Select

A measurable technical form of data is needed to include data assets in accounting statements

Big Data Research 2023, 9 (6): 184-187. DOI: 10.11959/j.issn.2096-0271.2023075

Abstract （488）

HTML （245）

PDF（pc）（1201KB）（260）

Save

Reference | Related Articles | Metrics

Select

Visualization in digital humanities

Yuchu LUO, Hao WU, Yuhan GUO, Shaocong TAN, Can LIU, Ruike JIANG, Xiaoru YUAN

Big Data Research 2022, 8 (6): 74-93. DOI: 10.11959/j.issn.2096-0271.2022085

Abstract （475）

HTML （95）

PDF（pc）（20687KB）（282）

Save

The development of information technology has promoted the generation of a new scientific research paradigm.Social sciences and humanities have gradually developed a data-driven research method in recent years.From the perspective of visualization, the current status of visualization applications in digital humanities was summarized from three levels, task, data, and application, through the analysis of papers at the digital humantites conference organized by alliance of digital humanities organizations.By analyzing different projects setup by different experts with different backgrounds, i.e., humanities, visualization, and art.The great potential of multidisciplinary cooperation to improve the quality of digital humanities plus visualization projects was revealed.The practice of Peking University in exploring this new paradigm of multidisciplinary cooperation in digital humanities plus visualization was shared, which included education in multidisciplinary, practice experience on promotion, and research experience in intelligent visualization.Finally, two ways for the future development of the interdiscipline between digital humanity and visualization were revealed, the cooperation between experts and the collaboration between humans and computers.

Table and Figures | Reference | Related Articles | Metrics

Select

Research on privacy data security sharing scheme based on blockchain and function encryption

Yi LI, Jinsong WANG, Hongwei ZHANG

Big Data Research 2022, 8 (5): 33-44. DOI: 10.11959/j.issn.2096-0271.2022072

Abstract （472）

HTML （107）

PDF（pc）（1752KB）（474）

Save

Blockchain technology has provided new ideas for data validation, data traceability, data trustworthiness, and data availability in data sharing, but privacy data security in data sharing still faces many challenges.Firstly, the current status of blockchain-based data sharing research was reviewed.Then a secure sharing model of privacy data was proposed.By encrypting the privacy data through function cryptography, and generating the proof of computational correctness through zero-knowledge proof technology, a secure and reliable data sharing with “data available but not visible” was realized.The experimental results show that the sharing delay and economic overhead of the model are within the acceptable range, which demonstrates the security and feasibility of the model.

Table and Figures | Reference | Related Articles | Metrics

Select

A Chinese text sentiment analysis method combining language knowledge and deep learning

Kangting XU, Wei Song

Big Data Research 2022, 8 (3): 115-127. DOI: 10.11959/j.issn.2096-0271.2022026

Abstract （465）

HTML （86）

PDF（pc）（1968KB）（675）

Save

At present, in the research of Chinese text emotion analysis, the method based on semantic rules and emotion dictionary usually needs to set the emotional threshold manually.However, the method based on deep learning can’t fully extract emotional features because it fails to use language knowledge such as semantic rules and emotional dictionary.As to shortcomings of two methods, a text emotion analysis method combining language knowledge and deep learning was proposed.Firstly, the key emotional segments in the text were extracted according to the semantic rules.Secondly, more explicit emotion words were extracted from the key emotional segments according to the emotional dictionary to construct the emotion set.Thirdly, the deep level features were extracted from the original text, key emotional segments and emotional set by using the deep learning model.Finally, the features were weighted and fused, and the classifier was used to judge the emotional polarity.The experimental results show that compared with the deep learning model without language knowledge, this method has significantly improved the ability of emotional polarity classification.

Table and Figures | Reference | Related Articles | Metrics

Select

Exploration and practice of trusted AI governance framework

Zhengxun XIA, Jianfei TANG, Shengmei LUO, Yan ZHANG

Big Data Research 2022, 8 (4): 145-164. DOI: 10.11959/j.issn.2096-0271.2022036

Abstract （462）

HTML （71）

PDF（pc）（2276KB）（636）

Save

Artificial intelligence (AI) has further improved the automation of information systems, however, some issues have been exposed during its large-scale application, such as data security, privacy protection, and fair ethics.To solve these issues and promote the transition of AI from available systems to trusted systems, the T-DACM trusted AI governance framework was proposed to improve the credibility of AI from the four levels of data, algorithm, calculation, and management.Different components were designed to solve specific issues such as data security, model security, privacy protection, model black box, fairness, accountability, and traceability.T-DACM practice case provides a demonstration of the trusted AI governance framework for the industry and provides a certain reference for subsequent product research and development based on the trusted AI governance framework.

Table and Figures | Reference | Related Articles | Metrics

Select

A fast text structuring methodology of TCM medical records based on NLP

Xiaoxia XIAO, Mingting LIU, Fengtianci YANG, Jianjianxian LIU, Yang YANG, Yue SHI

Big Data Research 2022, 8 (3): 128-139. DOI: 10.11959/j.issn.2096-0271.2022025

Abstract （457）

HTML （89）

PDF（pc）（2564KB）（704）

Save

Traditional Chinese medicine (TCM) medical records are the most valuable documents for TCM doctors to learn clinical experience.The structured TCM medical records are conducive to extract the clinic knowledge based on machine learning and other methods, which can accelerate the inheritance of TCM.A fast text structuring methodology of TCM medical records based on natural language processing(NLP)was proposed to structure the clinic cases.Essence of Chinese Modern Famous Chinese Medical Records was selected as the medical record structuring objects,and the text in the screenshots of the medical records was recognized by optical character recognition (OCR) and the text was initially structured.A simple symptom dictionary was constructed, and the improved N-gram model combined with the dictionary was used to recognize the symptoms, signs and other words in the text, and the dictionary was updated in the structuring process.At last, 4 754 text medical records were structured.The final model was test on 666 medical records selected randomly from the corpus, and its F1 value reached 82.99%.

Table and Figures | Reference | Related Articles | Metrics

Select

Digital transformation service platform:enhancing enterprise competitiveness in a new competitive situation

Yazhen YE, Yangyong ZHU

Big Data Research 2023, 9 (3): 3-14. DOI: 10.11959/j.issn.2096-0271.2023029

Abstract （453）

HTML （198）

PDF（pc）（1743KB）（485）

Save

With the improvement of data abilities and the development of emerging technologies, there are profound changes occurring in economic patterns and competitive structure of industries.In order to better respond to future opportunities and challenges, and to improve competitiveness of enterprises in new situations, it is necessary to understand and master the knowledge of digital transformation.The new competitive situation was discussed in which traditional enterprises would gradually be replaced by digital-transformed ones, digital transformation was differentiated from digitalization.Main challenges facing traditional enterprises while undergoing digital transformation were pinpointed, which were the lack of funds, talents, data and consciousness.A digital transformation service platform oriented to new competitive situation was proposed, which provided a feasible solution to enhancing enterprise competitiveness and conducting digital transformation.

Table and Figures | Reference | Related Articles | Metrics

Top Read Articles