通信学报 ›› 2012, Vol. 33 ›› Issue (12): 43-48.doi: 10.3969/j.issn.1000-436x.2012.12.006

• 学术论文 • 上一篇    下一篇

用于文本相似度计算的新核函数

王秀红1,2,3,4,鞠时光1   

  1. 1 江苏大学 科技信息研究所,江苏 镇江 212013
    2 江苏大学 理学院,江苏 镇江 212013
    3 加州大学戴维斯分校 农业与环境科学学院,加利福尼亚州 戴维斯 95616
    4 江苏大学 计算机科学与通信工程学院,江苏 镇江 212013
  • 出版日期:2012-12-25 发布日期:2017-07-15

Novel kernel function for computing the similarity of text

Xiu-hong WANG1,2,3,4,Shi-guang JU1   

  1. 1 Institute of Science and Technology Information,Jiangsu University,Zhenjiang 212013,China
    2 Faculty of Science,Jiangsu University,Zhenjiang 212013,China
    3 College of Agricultural and Environmental Sciences,University of California-Davis,Davis 95616,USA
    4 School of Computer Science and Telecommunication Engineering,Jiangsu University,Zhenjiang 212013,China
  • Online:2012-12-25 Published:2017-07-15

摘要:

摘 要:为了提高文本相似检测的综合表现,在文本文档相似特征的基础上构造了新的核函数S_Wang核函数。结合文本相似计算过程中的实际情况,将待比对的文本表示成向量,考虑通过2个向量间的乘积和欧氏距离来描述向量之间的相似程度,从而构造了适合文本相似度计算的新核函数,并根据 Mercer 定理证明了所构造函数可以作为核函数。实验验证了新构造的核函数在文本文档相似度计算中的表现,实验结果表明S_Wang核其相似度计算精度和综合指标均分别优于Cauchy核、潜在语义核(LSK)以及CLA复合核。S_Wang核适用于文本相似度计算。

关键词: 信息检索, 文本相似度, 核函数, S_Wang核, 潜在语义核, Cauchy核, CLA复合核

Abstract:

To enhance the performance of detecting similar documents,a novel kernel function named S_Wang kernel was constructed.Based on the actual situation of computing text similarity,the S_Wang kernel was newly bu lt with consideration of the Euclidean distance and angle between vectors that represented the text documents to be compared.It was proved that the function could be constructed as a kernel function according to Mercer theorem.Experimental verification of the performance of the kernels in the text document similarity calculation was provided.The results show that the S_Wang kernel is significantly better than the precision and F1 performance of other kernels like Cauchy kernel,Latent Semantic Kernel (LSK) and CLA kernel.S_Wang kernel is suitable for text similarity computation.

Key words: information retrieval, text similarity, kernel function, S_Wang kernel, LSK, Cauchy kernel, CLA kernel

No Suggested Reading articles found!