通信学报 ›› 2014, Vol. 35 ›› Issue (6): 15-24.doi: 10.3969/j.issn.1000-436x.2014.06.003

• 学术论文 • 上一篇    下一篇

高阶异构数据模糊联合聚类算法

黄少滨,杨欣欣,申林山,李艳梅   

  1. 哈尔滨工程大学 计算机科学与技术学院,黑龙江 哈尔滨 150001
  • 出版日期:2014-06-25 发布日期:2017-06-29
  • 基金资助:
    国家自然科学基金资助项目;国家自然科学基金资助项目;国家自然科学基金资助项目;国家科技支撑计划基金资助项目;国家科技支撑计划基金资助项目;博士后科学基金资助项目;中央高校基本科研业务费专项基金资助项目;中央高校基本科研业务费专项基金资助项目

Fuzzy co-clustering algorithm for high-order heterogeneous data

Shao-bin HUANG,Xin-xin YANG,Lin-shan SHEN,Yan-mei LI   

  1. College of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China
  • Online:2014-06-25 Published:2017-06-29
  • Supported by:
    The National Natural Science Foundation of China;The National Natural Science Foundation of China;The National Natural Science Foundation of China;The National Key Tech-nology R&D Program;The National Key Tech-nology R&D Program;The Science Foundation for Post Doctorate Research;The Fundamental Research Funds for the Central University;The Fundamental Research Funds for the Central University

摘要:

为了更有效地分析聚簇重叠部分高阶异构数据的聚簇结果,提出了一种高阶异构数据模糊联合聚类(HFCC)算法,该算法最小化每个特征空间中对象与聚簇中心的加权距离。推导出对象隶属度和特征权重的迭代更新公式,设计出聚类过程的迭代算法,并且从理论上证明了该迭代算法的收敛性。另外,通过泛化XB指标,提出适用于评估高阶异构数据聚类质量的指标GXB,用于判断聚簇数目。实验表明,HFCC算法能够有效探测数据内部隐藏的重叠聚簇结构,并且HFCC算法聚类效果明显优于5种有代表性的硬划分算法,此外GXB指标能够有效判定高阶异构数据的聚簇数目。

关键词: 高阶异构数据, 联合聚类, 模糊聚类

Abstract:

In order to analyze the clustering results of high-order heterogeneous data at the overlaps of different clusters more efficiently, a fuzzy co-clustering algorithm was developed for high-order heterogeneous data (HFCC). HFCC algo-rithm minimized distances between objects and centers of clusters in each feature space. The update rules for fuzzy memberships of objects and weights of features were derived, and then an iterative algorithm was designed for the clus-tering process. Additionally, convergence of iterative algorithm was proved. In order to estimate the number of clusters, GXB validity index was proposed by generalizing the XB validity index, which could measure the quality of high-order clustering results. Finally, experimental results show that HFCC can efficiently mine the overlapped clusters and the qualities of clustering results of HFCC are superior five classical hard high-order co-clustering algorithms. Additionally, GXB validity index can efficiently estimate the number of high-order clusters.

Key words: high-order heterogeneous data, co-clustering, fuzzy clustering

No Suggested Reading articles found!