Big Data Research ›› 2023, Vol. 9 ›› Issue (4): 59-68.doi: 10.11959/j.issn.2096-0271.2023048

• TOPIC: CROSS-DOMAIN DATA MANAGEMENT • Previous Articles    

Research on iterative data cleaning of human-computer interaction

Yida LIU, Xiaoou DING, Hongzhi WANG, Donghua YANG   

  1. School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
  • Online:2023-07-01 Published:2023-07-01
  • Supported by:
    The National Key Research and Development Program of China(2021YFB3300502);The National Natural Science Foundation of China(62202126);The National Natural Science Foundation of China(62232005);China Postdoctoral Science Foundation(2022M720957);Heilongjiang Postdoctoral Financial Assistance(LBH-Z21137)

Abstract:

The advancement of data collection technology has led to a rapid increase in the size of datasets.Due to the big scale and high complexity of the data volume, serious data quality issues arise.Therefore, data cleaning is a necessary and important step in data activities.To effectively reduce human annotation costs while ensuring the accuracy of cleaning, an iterative data cleaning method (IDCHI) with human participation was proposed.This method proposed a data selection optimization method in the detection module, which enables the classifier to have high accuracy in the initial stage; and further proposed a method for selecting data to be manually annotated, effectively reducing the amount of data to be manually annotated.The experimental results show that the proposed method is effective and efficient in cleaning erroneous data.

Key words: data cleaning, human_in_loop, iteration, mini-batch gradient descent

CLC Number: 

No Suggested Reading articles found!