通信学报

• 学术通信 • 上一篇    下一篇

基于有意义串聚类的微博热点话题发现方法

贺敏1,2,王丽宏2,杜攀1,张瑾1,程学旗1   

  1. 1. 中国科学院 计算技术研究所,北京 100080;2. 国家计算机网络应急技术处理协调中心,北京 100029
  • 出版日期:2013-08-25 发布日期:2013-12-16
  • 基金资助:
    国家科技支撑基金资助项目 (2012BAH46B01);国家自然科学基金资助项目(61170230)

Microblog hot topic detection method based on meaningful string clustering

  • Online:2013-08-25 Published:2013-12-16

摘要: 针对微博数据特征稀疏、内容碎片化的特点,提出一种基于有意义串聚类的热点话题发现方法。结合重复串计算、上下文邻接分析和语言规则过滤多种策略,提取能够表达独立完整语义的有意义串,并将微博数据建模在相对较小的有意义串空间,通过聚类产生候选话题,根据热度排序发现热点话题。微博数据实验结果表明,该方法在一定程度上实现对微博高维稀疏空间的降维,对于微博空间的热点话题发现有效可行。

Abstract: Aiming at the properties of sparse feature, content fragmentation for microblog data, a hot topic detection method was proposed based on meaningful string clustering. The multiple strategies including repeated string detection, context analysis and language rule filtering were combined to extract meaningful strings. Candidate topics were generated by clustering with distribution of meaningful strings in documents. The hot topics were detected according to hotness sorting for candidate topics. As is shown from the experiment results on microblog data, the method achieves good effect in solving the problem of data sparseness. It is effective and feasible to hot topic detection for microblog.

No Suggested Reading articles found!