电信科学 ›› 2016, Vol. 32 ›› Issue (8): 124-127.doi: 10.11959/j.issn.1000-0801.2016214

• 研究与开发 • 上一篇    下一篇

一种改进的CLTree算法

李卓航   

  1. 浙江大学信息与电子工程学院,浙江 杭州310058
  • 出版日期:2016-08-20 发布日期:2017-04-26

An improved CLTree algorithm

Zhuohang LI   

  1. College of Information Science & Electronic Engineering,Zhejiang University,Hangzhou 310058,China
  • Online:2016-08-20 Published:2017-04-26

摘要:

针对聚类算法CLTree 精度低、算法效率低的问题,提出了CLTree-R 算法,之后将其应用于UCI 数据集进行聚类分析。基于Spark平台的特性对数据进行并行处理,加快了算法运行效率。实验结果也表明,使用该算法对官方数据集进行聚类分析时,可以得到较为合理的顾客划分。

关键词: 聚类, Spark, 数据挖掘, 并行化

Abstract:

An improved algorithm called CLTree-R was proposed.It could compensate the shortcoming of CLTree algorithm such as low accurate and inefficiency.Then CLTree-R was applied in clustering analysis for UCI data sets.In order to improve the efficiency,data set was parallel processed on Spark platform.Experimental results show that this algorithm can get reasonable customer classification when making cluster analysis on official data set.

Key words: clustering, Spark, data mining, parallelization

No Suggested Reading articles found!