基于局部密度构造相似矩阵的谱聚类算法

doi:10.3969/j.issn.1000-436x.2013.03.003

摘要/Abstract

摘要：

依据样本数据点分布的局部和全局一致性特征，提出了一种基于局部密度构造相似矩阵的谱聚类算法。首先通过分析样本数据点的分布特性给出了局部密度定义，根据样本点的局部密度对样本点集由密到疏排序，并按照设计的连接策略构建无向图；然后以GN算法思想为参考，给出了一种基于边介数的权值矩阵计算方法，经过数据转换得到谱聚类相似矩阵；最后通过第一个极大本征间隙出现的位置来确定类个数，并利用经典聚类方法对特征向量空间中的数据点进行聚类。通过人工仿真数据集和UCI数据集进行测试，实验结果表明本文谱聚类算法具有较好的顽健性。

关键词: 谱聚类, 相似矩阵, 局部密度, 无向图构建, 边介数

Abstract:

According to local and global consistency characterist points'distribution, a spectral cluster-ing algorithm using local density-based similarity matrix construction was proposed. Firstly, by analyzing distribution characteristics of sample data points, the definition of local density was given, sorting operation on sample point set from dense to sparse according to sample points'local density was did, and undirected graph in accordance with the designed connection strategy was constructed; then, on the basis of GN algorithm's thinking, a calculation method of weight matrix using edge betweenness was given, and similarity matrix of spectral clustering via data conversion was got; lastly, the class number by appearing position of the first eigengap maximum was determined, and the classification of sample point set in eigenvector space by means of classical cluster g method was realized. By means of artificial simulative data set and UCI data set to carry out the experimental tests, show that the proposed spectral algorithm has better cluster-ing capability.

Key words: spectral clustering, similarity matrix, local density, undirected graph building, edge betweenness

吴健,崔志明,时玉杰,盛胜利,龚声蓉. 基于局部密度构造相似矩阵的谱聚类算法[J]. 通信学报, 2013, 34(3): 14-22.

Jian WU,Zhi-ming CUI,Yu-jie SHI,Sheng-li SHENG,Sheng-rong GONG. Local density-based similarity matrix construction for spectral clustering[J]. Journal on Communications, 2013, 34(3): 14-22.

图/表 12

图1

图2

图3

图4

图5

图6

图7

图8

图9

图10

表1

表2

参考文献 16

[1]	CRISTIANININ , SHAWE-TAYLORJ , KANDOLAJ . Spectral kernel methods for clustering[J]. Proceedings of the 13th Advances in Neur-al Information Processing Systems(NIPS 2001)[C]. Vancouver, British Columbia, Canada, 2001. 649-655.
[2]	NGA Y , JORDANM I , WEISSY . On spectral clustering:analysis and an algorithm[A]. Proceedings of the 14th Advances in Neural Informa-tion Processing Systems(NIPS 2002)[C]. Vancouver, British Columbia, Canada, 2002. 849-856.
[3]	邓晓政, 焦李成, 卢山 . 基于非负矩阵分解的谱聚类集成 SAR图像分割[J]. 电子学报, 2011,39(12): 2905-2909.
[4]	DUCOURNAUA , BRETTOA , RITALS , et al. A reductive approach to hypergraph clustering: an application to image segmentation[J]. Pattern Recognition, 2012,45(7): 2788-2803.
[5]	徐森, 卢志茂, 顾国昌 . 使用谱聚类算法解决文本聚类集成问题[J]. 通信学报， 2010,31(6): 58-66. XUS , LUZ M , GUG C . Spectral clustering algorithms for document cluster ensemble problem[J]. Journal on Communications, 2010,31(6): 58-66.
[6]	ZHANGT P , TANGY Y , FANGB , et al. Document clustering in correlation similarity measure space[J]. IEEE Transactions on Know-ledge and Data Engineering, 2012,24(6): 1002-1013.
[7]	LUXBURGU V . A tutorial on spectral clustering[J]. Statistics and Computing, 2007,17(4): 395-416.
[8]	李建元, 周脚根, 关佶红等. 谱图聚类算法研究进展[J]. 智能系统学报， 2011,6(5): 405-414. LIJ Y , ZHOUJ G , GUANJ H , et al. A survey of clustering algorithms based on spectra of graphs[J]. CAAI Transactions on Intelligent Sys-tems, 2011,6(5): 405-414.
[9]	王玲，薄列峰，焦李成 . 密度敏感的谱聚类[J]. 电子学报， 2007,35(8): 1577-1581. WANGL , BOL F , JIAOL C Density-sensitive spectral clustering[J]. Acta Electronica Sinica, 2007,35(8): 1577-1581.
[10]	孔万增, 孙志海, 杨灿等. 基于本征间隙与正交特征向量的自动谱聚类[J]. 电子学报, 2010, 38(8): 1880-1891. KONGW Z , SUNZ H , YANGC , et al. Automatic spectral clustering based on eigengap and orthogonal eigenvector[J]. Acta Electronica Si-nica, 2010,38(8): 1880-1891.
[5]	ZHOUD , BOUSQUETO , LALT N , et al. Learning with local and global consistency[A]. Proceedings of the 16th Advances in Neural Information Processing Systems(NIPS 2004)[C]. Vancouver, British Columbia, Canada, 2004. 321-328.
[12]	GIRVANM , NEWMANM E J . Community structure in social and biological networks[J]. The National Academy of Science, 2002,9(12): 7821-7826.
[13]	杨博, 刘大有, 刘际明等. 复杂网络聚类方法[J]. 软件学报, 2009,20(1): 54-66. YANGB , LIUD Y LIUJ M , et al. Complex network clustering algo-rithms[J]. Journal of Software, 2009,20(1): 54-66.
[14]	PINNEYJ W , WESTHEADD R . Betweenness-based Decomposition Methods for Social and Biological Networks[M]. Interdisciplinary Statistics and Bioinformatics, Leeds University Press, 2007.
[15]	MEILAM , XUL . Multiway Cuts and Spectral Clustering[R]. Univer-sity of Washington, 2003.
[16]	刘铭, 王晓龙, 刘远超 . 基于语义的高维数据聚类技术[J]. 电子学报， 2009,37(5): 925-929. LIUM , WANGX L , LIUY C . Clustering technology for high dimen-sional data based on semantics[J]. Acta Electronica Sinica, 2009,37(5): 925-929.

样本数据集	维数	样本数	固有类数
Satimage	36	444	6
Iris	4	150	3
Ionosphere	34	351	2
New-thyroid	5	215	3

样本数据集	算法
	LDSC			ASC
	CT/%	时间/s	K	CT/%	时间/s	s
Satimage	80.41	2.386 8	5	77.93	2.315 3	0.07
Iris	93.33	0.214 9	8	92.00	0.140 5	.16
Ionosphere	90.88	1.092 6	8	72.08	1.042 4	0.20
New-thyroid	89.76	0.387 3	5	89.30	0.323 5	0.13