Journal on Communications ›› 2023, Vol. 44 ›› Issue (8): 228-240.doi: 10.11959/j.issn.1000-436x.2023143

• Correspondences • Previous Articles    

Parallel deep forest algorithm based on Spark and three-way interactive information

Yimin MAO1,2, Zhan ZHOU1, Zhigang CHEN3   

  1. 1 School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou 341000, China
    2 College of Information Engineering, Shaoguan University, Shaoguan 512026, China
    3 College of Computer Science and Engineering, Central South University, Changsha 410083, China
  • Revised:2023-07-01 Online:2023-08-01 Published:2023-08-01
  • Supported by:
    Key Promotion Project of Guangdong Province(2022ZDJS048);“2030 Innovation Megaprojects”-New Generation Artificial Intelligence Project(2020AAA0109605)

Abstract:

To address issues such as excessive redundancy and irrelevant features, long class vectors, slow model convergence, and low efficiency of parallel training in parallel deep forests, a parallel deep forest algorithm based on Spark and three-way interactive information was proposed.Firstly, a feature selection based on feature interaction (FSFI) strategy was proposed to filter the original features and eliminate irrelevant and redundant features.Secondly, a multi-granularity vector elimination (MGVE) strategy was proposed, which fused similar class vectors and shortened the class vector length.Subsequently, the cascade forest feature enhancement (CFFE) strategy was proposed to improve the utilization of information and accelerate the convergence speed of the model.Finally, a multi-level load balancing (MLB) strategy was proposed, combined with the Spark framework, to improve the parallelization efficiency through adaptive sub-forest division and heterogeneous skew data partitioning.Experimental results demonstrate that the proposed algorithm significantly improves the model classification effect and reduces the parallelization training time.

Key words: Spark framework, parallel deep forest algorithm, feature selection, multilevel load balancing

CLC Number: 

No Suggested Reading articles found!