大数据 ›› 2022, Vol. 8 ›› Issue (6): 94-104.doi: 10.11959/j.issn.2096-0271.2022041

• 研究 • 上一篇    下一篇

基于双曲空间图嵌入的科研热点预测

戴筠   

  1. 上海大学,上海 200041
  • 出版日期:2022-11-15 发布日期:2022-11-01
  • 作者简介:戴筠(1966- ),女,上海大学副教授,主要研究方向为数据挖掘和机器学习

Emerging scientific topic prediction based on Poincare graph embedding

Jun DAI   

  1. Shanghai University, Shanghai 200041, China
  • Online:2022-11-15 Published:2022-11-01

摘要:

预测科研热点可以有效地开展科学研究和更好地分配科学资源。数据挖掘和机器学习算法已经被广泛应用到科研热点预测中,比如基于论文文本内容的主题模型建模和挖掘论文被引频次的算法等。提出一种新的将关键词信息嵌入双曲空间的双曲空间关键词图嵌入(PKGM)算法,利用关键词和它们之间的关系构建一个关键词网络,通过计算双曲空间中两个节点的距离来判别两个节点之间存在边的概率,从而对科研热点进行预测。该算法与7个基准算法的实验比较结果显示,PKGM算法与效果最好的欧氏空间嵌入算法相比有7.3%的AUROC和5.8%的AP提升;与双曲图神经网络算法相比,有10.8%的AUROC和7.2%的AP提升。这显示了PKGM算法的有效性。

关键词: 科研热点, 双曲空间, 庞加莱模型, 图嵌入, 关键词网络, 长尾效应

Abstract:

Scientific topic prediction is central to scientific research and can substantially advance the allocation of scientific resources.Machine learning and data mining approaches have been widely applied to scientific topic prediction, including paper content-based topic model and citation prediction models.A novel scientific topic prediction algorithm PKGM (Poincare keywords graph embedding) was proposed, which utilized keywords and their relations to build a keyword network, and calculated the distance between two nodes in this network to predict the probability that an edge existed.The result of comparing PKGM with seven baselines showed that PKGM obtained a 7.3% improvement by using AUROC and 5.8% improving by using AP in comparison to the best method in Euclidean space, and 10.8% improvement by using AUROC and 7.2% improving by using AP over the best approach in hyperbolic space.The results demonstrated the effectiveness of PKGM.

Key words: scientific topic, hyperbolic space, Poincare's model, graph embedding, keywords network, long tail effect

中图分类号: 

No Suggested Reading articles found!