通信学报 ›› 2016, Vol. 37 ›› Issue (12): 50-55.doi: 10.11959/j.issn.1000-436x.2016239

• 学术论文 • 上一篇    下一篇

基于层次分析的微博短文本特征计算方法

邹学强1,2,3,包秀国2,黄晓军4,马宏远2,袁庆升1,2,3   

  1. 1 中国科学院信息工程研究所,北京 100093
    2 国家计算机网络应急技术处理协调中心,北京 100029
    3 中国科学院大学,北京100049
    4 北京邮电大学信息与通信工程学院,北京 100876
  • 出版日期:2016-12-25 发布日期:2017-05-15
  • 基金资助:
    国家高技术研究发展计划(“863”计划)基金资助项目;国家自然科学基金资助项目;国家自然科学基金资助项目

Calculating the feature method of short text based on analytic hierarchy process

Xue-qiang ZOU1,2,3,Xiu-guo BAO2,Xiao-jun HUANG4,Hong-yuan MA2,Qing-sheng YUAN1,2,3   

  1. 1 Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China
    2 National Computer Network Emergency Response Technical Team/Coordination Center of China, Beijing 100029, China
    3 University of Chinese Academy of Sciences, Beijing 100049, China
    4 School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Online:2016-12-25 Published:2017-05-15
  • Supported by:
    The National High Technology Research and Development Program (863 Program);The National Natural Science Foundation of China;The National Natural Science Foundation of China

摘要:

为了建立用户精准兴趣模型以有效发现具有相似兴趣的用户群,提出了一种针对微博的短文本特征计算方法用于聚类算法,提升聚类效果以更好地挖掘微博用户的相似兴趣集合。该方法融合了微博转发数、评论数、点赞数等多个关键指标来度量微博短文本特征的重要性。同时,引入层次分析技术,改进了传统的tf-idf特征计算方法,并利用经典文本聚类算法进行实验。实验结果表明,改进后的短文本特征计算方法与传统的tf-idf特征计算方法相比,在类内集中度和类间分散度上取得了更好的效果。

关键词: 层次分析, 特征计算, 文本聚类, 短文本

Abstract:

In order to model the accurate interest preference of microblog users and discover user groups with similar in-terest, a new method was proposed which considered the total amount of retweets, comments and attitudes of each mi-croblog for text feature calculation with utilizing classic analytical hierarchy process method. The proposed method used three indicators to evaluate the importance of the text feature representation and made an improvement on traditional tf-idf feature calculation method to fit for short text. Furthermore, this method was also implemented in the traditional clustering algorithm. Experimental results show that, compared with the traditional tf-idf method, the improved approach has a better clustering effect on the average scattering for clusters and the total separation between clusters.

Key words: analytic hierarchy process, feature calculation, text clustering, short tex

No Suggested Reading articles found!