通信学报

• 学术论文 • 上一篇    下一篇

基于查询-文档异构信息网络的半监督学习

刘钰峰,李仁发   

  1. 1. 湖南大学 信息科学与工程学院,湖南 长沙 410082;2. 湖南大学 嵌入式系统与网络实验室,湖南 长沙 410082
  • 出版日期:2014-08-25 发布日期:2014-08-15
  • 基金资助:
    国家自然科学基金资助项目(61173036)

Semi-supervised learning by constructing query-document heterogeneous information network

  • Online:2014-08-25 Published:2014-08-15

摘要: 基于图的半监督学习近年来得到了广泛的研究,然而,现有的半监督学习算法大都只能应用于同构网络。根据查询及文档自身的内容特征和点击关系构建查询—文档异构信息网络,并引入样本的判别信息强化网络结构。提出了查询—文档异构信息网络上半监督聚类的正则化框架和迭代算法,在正则化框架中,基于流形假设构造了异构信息网络上的代价函数,并得到该函数的封闭解,以此预测未标记查询和文档的类别标记。在大规模商业搜索引擎查询日志上的实验表明本方法优于传统的半监督学习方法。

Abstract: Various graph-based algorithms for semi-supervised learning have been proposed in recent literatures. However, although classification on homogeneous networks has been studied for decades, classification on heterogeneous networks has not been explored until recently. The semi-supervised classification problem on query-document heterogeneous information network which incorporate the bipartite graph with the content information from both sides is considered. In order to strengthen the network structure, class information of sample nodes is introduced. A semi-supervised learning algorithm based on two frameworks including the novel graph-based regularization framework and the iterative framework is investigated. In the regularization framework, a new cost function to consider the direct relationship between two entity sets and the content information from both sides which leads to a significant improvement over the baseline methods is developed. Experimental results demonstrate that proposed method achieves the best performance with consistent and promising improvements.

No Suggested Reading articles found!