Journal on Communications ›› 2021, Vol. 42 ›› Issue (5): 122-136.doi: 10.11959/j.issn.1000-436x.2021052

• Papers • Previous Articles     Next Articles

Parallel association rules incremental mining algorithm based on information entropy and genetic algorithm

Yimin MAO1, Qianhu DENG1, Zhigang CHEN2   

  1. 1 School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou 341000, China
    2 College of Computer Science and Engineering, Central South University, Changsha 410083, China
  • Revised:2021-02-04 Online:2021-05-25 Published:2021-05-01
  • Supported by:
    The National Natural Science Foundation of China(41562019);The National Natural Science Foundation of China(61762046);The National Key Research and Development Program of China(2018YFC1504705)

Abstract:

Aiming at the problems that in the big data environment, the Can-tree based incremental association rule algorithm had problems such as too much space occupation of the tree structure, inability to dynamically set the support threshold, and too much time consumption during the data transfer process between the Map and Reduce stages, the Map Reduce-based parallel association rules incremental mining algorithm using information entropy and genetic algorithm (MR-PARIMIEG)was proposed.Firstly, a similar items merging based on information entropy (SIM-IE) was designed to merge similar data items, and a Can tree based on the merged data set was constructed, thereby reducing the space occupation of the tree structure.Secondly, the dynamic support threshold obtaining using genetic algorithm (DST-GA) was proposed to obtain the relatively optimal dynamic support threshold in the big data environment, and frequent itemset mining was performed according to this threshold to avoid the unnecessary time consumption caused by mining redundant frequent patterns.Finally, in the process of MapReduce parallel operation, the parallel LZO data compression algorithm was used to compress the output data of the Map stage, thereby reducing the size of the transmitted data, and finally improving the running speed of the algorithm.Experimental simulation results show that MR-PARIMIEG has better performance when mining frequent item sets in the big data environment, and it is suitable for parallel processing of larger data sets.

Key words: Can-tree, information entropy, big data, incremental mining, data compression

CLC Number: 

No Suggested Reading articles found!