大数据

• •    

基于双曲空间图嵌入的科研热点预测

戴筠   

  1. 上海大学 上海 200041
  • 作者简介:戴筠(1966-),女,上海人,上海大学副教授,主要研究方向数据挖掘和机器学习。

Scientific Topic Prediction via Poincare Keywords Graph Embedding

Dai Jun   

  1. Shanghai University, Shanghai 200041

摘要: 预测科研热点可以有效地开展科学研究和更好地分配科学资源。机器学习和数据挖掘已经被广泛应用到科研热点预测中,比如基于论文文本内容的主题模型建模和挖掘论文被引频次的算法等。该文提出了一种新的将关键词信息嵌入到双曲空间中的PKGM(Poincare Keywords Graph Embedding)算法,即利用关键词和它们之间存在的关系来构建一个关键词网络,通过计算双曲空间中两个节点的距离来判别两个节点之间存在边的概率,从而对科研热点进行预测。该算法与七个基准算法进行实验比较,和效果最好的欧式空间嵌入相比提高7.3%的AUROC 和5.8%的AP ,和双曲图神经网络算法相比更有10.8%的AUROC和7.2%的AP提升,显示了PKGM算法的有效性。

关键词: 科研热点, 双曲空间, 庞加莱模型, 图嵌入, 关键词网络, 长尾效应

Abstract: Predicting scientific topics is central to scientific research and could substantially advance the allocation of scientific resources. Machine learning and data mining approaches have been widely applied to scientific topic prediction, including paper content-based topic model and citation prediction models. In this paper, we proposed a novel scientific topic prediction method PKGM, which utilizes the keyword network between keywords to embed scientific keyword information into the hyperbolic space. This approach predicts scientific topics through predicting the links within the network in the hyperbolic space. We compared our approach with seven baselines on a large-scale real-world dataset and observed that our method obtained a 7.3% improvement using AUROC and 5.8% improving using AP in comparison to the best method in Euclidean space and 10.8% improvement using AUROC and 7.2% improving using AP over the best approach in hyperbolic space, which demonstrates the effectiveness of PKGM.

Key words: Scientific Topic, Hyperbolic space, Poincare's model, Graph Embedding, Keywords network, Long tail effect

No Suggested Reading articles found!