通信学报 ›› 2020, Vol. 41 ›› Issue (8): 155-164.doi: 10.11959/j.issn.1000-436x.2020152

• 学术论文 • 上一篇    下一篇

CMDC:一种差异互补的迭代式多维度文本聚类算法

黄瑞章1,2,白瑞娜1,陈艳平1,2,秦永彬1,2,程欣宇1,3,田有亮1,2   

  1. 1 贵州大学计算机科学与技术学院,贵州 贵阳 550025
    2 贵州省公共大数据重点实验室,贵州 贵阳 550025
    3 贵州省智能人机交互工程技术研究中心,贵州 贵阳 550025
  • 修回日期:2020-06-10 出版日期:2020-08-25 发布日期:2020-09-05
  • 作者简介:黄瑞章(1979- ),女,天津人,博士,贵州大学副教授、硕士生导师,主要研究方向为数据挖掘、文本挖掘、机器学习和信息检索|白瑞娜(1994- ),女,山西兴县人,贵州大学硕士生,主要研究方向为文本挖掘、机器学习|陈艳平(1980- ),男,贵州长顺人,博士,贵州大学副教授、硕士生导师,主要研究方向为人工智能、自然语言处理|秦永彬(1980- ),男,山东招远人,博士,贵州大学教授、博士生导师,主要研究方向为智慧计算与智能计算、大数据管理与应用|程欣宇(1978- ),男,贵州绥阳人,贵州大学副教授,主要研究方向为机器学习和计算机视觉|田有亮(1982- ),男,贵州盘县人,博士,贵州大学教授,主要研究方向为算法博弈论、密码学与安全协议、大数据安全与隐私保护、电子货币与区块链技术
  • 基金资助:
    国家自然科学基金资助项目(61462011);国家自然科学基金资助项目(91746116);国家自然科学基金联合基金资助项目(U1836205);贵州省科学技术基金资助项目([2020]1Z055)

CMDC:an iterative algorithm for complementary multi-view document clustering

Ruizhang HUANG1,2,Ruina BAI1,Yanping CHEN1,2,Yongbin QIN1,2,Xinyu CHENG1,3,Youliang TIAN1,2   

  1. 1 College of Computer Science and Technology,Guizhou University,Guiyang 550025,China
    2 Guizhou Provincial Key Laboratory of Public Big Data,Guiyang 550025,China
    3 Guizhou Intelligent Human-Computer Interaction Engineering Technology Research Center,Guiyang 550025,China
  • Revised:2020-06-10 Online:2020-08-25 Published:2020-09-05
  • Supported by:
    The National Natural Science Foundation of China(61462011);The National Natural Science Foundation of China(91746116);The Joint Funds of the National Natural Science Foundation of China(U1836205);The Key Projects of Science and Technology of Guizhou([2020]1Z055)

摘要:

针对传统多维度文本聚类算法把文本表示与聚类过程分离,忽略了维度间的互补特性的问题,提出了一种差异互补的迭代式多维度文本聚类算法——CMDC,实现文本聚类与特征调整过程的统一优化。CMDC算法挑选维度聚类间结果的互补文本,基于局部度量学习算法利用互补文本促进聚类的特征调优,以维度的度量一致性来解决多维度文本聚类的划分一致性。实验结果表明,CMDC算法有效地提升了多维度聚类性能。

关键词: 多维度文本聚类, 互补文本, 约束文本聚类, 度量计算

Abstract:

In response to the problems traditional multi-view document clustering methods separate the multi-view document representation from the clustering process and ignore the complementary characteristics of multi-view document clustering,an iterative algorithm for complementary multi-view document clustering——CMDC was proposed,in which the multi-view document clustering process and the multi-view feature adjustment were conducted in a mutually unified manner.In CMDC algorithm,complementary text documents were selected from the clustering results to aid adjusting the contribution of view features via learning a local measurement metric of each document view.The complementary text document of the results among the dimensionality clusters was selected by CMDC,and used to promote the feature tuning of the clusters.The partition consistency of the multi-dimensional document clustering was solved by the measure consistency of the dimensions.Experimental results show that CMDC effectively improves multi-dimensional clustering performance.

Key words: multi-view document clustering, complementary text, constrained document clustering, metric calculation

中图分类号: 

No Suggested Reading articles found!