通信学报 ›› 2016, Vol. 37 ›› Issue (3): 48-54.doi: 10.11959/j.issn.1000-436x.2016052

• 学术论文 • 上一篇    下一篇

基于时间序列分析的微博突发话题检测方法

贺敏1,2,徐杰2,杜攀1,程学旗1,王丽宏2   

  1. 1 中国科学院计算技术研究所,北京100080
    2 国家计算机网络应急技术处理协调中心,北京100029
  • 出版日期:2016-03-25 发布日期:2017-08-04
  • 基金资助:
    国家高技术研究发展计划(“863”计划)基金资助项目;国家科技支撑计划基金资助项目

Bursty topic detection method for microblog based on time series analysis

Min HE1,2,Jie2 XU2,Pan1 DU1,Xue-qi1 CHENG1,Li-hong WANG2   

  1. 1 Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100080,China
    2 National Computer Network Emergency Response Technical Team/Coordination Center of China,Beijing 100029,China
  • Online:2016-03-25 Published:2017-08-04
  • Supported by:
    The National High Technology Research and Development Program of China(863 Program);The National Key Technology Support Program

摘要:

针对微博信息噪音大、新颖度难以判断的问题,在动量模型的基础上进行优化,提出了基于时序分析的微博突发话题检测方法。通过动量模型提取候选突发特征后,对特征的动量时间序列分别借鉴信号频域分析理论和股票趋势分析理论进行建模,分析特征的频域特性来识别频繁伪突发特征,分析特征的新颖程度来识别间歇性伪突发特征,合并过滤后的有效突发特征形成突发话题。微博数据实验表明,该方法有效提高了突发话题检测的准确率和F值。

关键词: 突发话题, 微博, 突发特征, 时序分析

Abstract:

Detecting bursty topics from microblogs was an important task to understand the current events attracting a large number of internet users.However,the existing hods suitable for news articles cannot be adopted directly for microblogs.Because microblogs have unique characteristics compared wi formal texts,including diversity,dynamic and noise.A detection method for microblog bursty topic was proposed based on time series analysis,which was an op-timization method of momentum model.The candidate bursty features were extracted by momentum model.The time se-ries of feature's momentum were modled by frequency domain analysis theory and stock trend analysis theory.The fre-quently pseudo-bursty features were filtered according to analysis results of frequency-domain characteristics.The inter-mittently pseudo-bursty features were filtered according to the novelty analysis result through stock trend theory.The bursty topics were finally emerged with combination of effective bursty features.The experiments are conducted on a real Sina microblog data set.It show that the proposed method improves the precis and F-measure remarkably compared with the momentum modle.

Key words: bursty topic, microblog, bursty feature, time series analysis

No Suggested Reading articles found!