Telecommunications Science ›› 2012, Vol. 28 ›› Issue (7): 40-47.doi: 10.3969/j.issn.1000-0801.2012.07.012

• research and development • Previous Articles     Next Articles

Parallel Ensemble Classification Algorithm Based on the MapReduce Technology

Chunhua Ju1,2,Jiangbo Zou1,Zui Zhang1,Jianliang Wei1   

  1. 1 School of Computer Science & Information Engineering,Zhejiang Gongshang University,Hangzhou 310018,China
    2 Center for Studies of Modern Business,Zhejiang Gongshang University,Hangzhou 310000,China
  • Online:2017-02-22 Published:2017-02-22

Abstract:

Because of the computer memory resource constraints,the effectiveness of the combination of classifier and the optimal choice is the main contents of the field of machine learning.Classic ensemble classification algorithm in dealing with small data sets with a higher classification accuracy,but the face of large amounts of data,more than the base classifier learning,classification occupy mangy computer resources,leading to low computational efficiency,which is obviously not suited to deal with today's massive data.For the already integrated the classification algorithm is only suitable for the role of the shortcomings of small-scale data sets,analyze the characteristics of the ensemble classifier,using the parallel integration algorithm based on the aggregation of the ensemble classifier and cloud computing,MapReduce technology to achieve parallel processing the purpose of the massive scale of data.And in the Amazon compute cluster to simulate the experimental results show that the algorithm has a certain efficiency and feasibility.

Key words: cloud computing, ensemble classifier, parallel integration, MapReduce

No Suggested Reading articles found!