通信学报 ›› 2014, Vol. 35 ›› Issue (8): 40-47.doi: 10.3969/j.issn.1000-436x.2014.08.006

• 学术论文 • 上一篇    下一篇

基于查询—文档异构信息网络的半监督学习

刘钰峰1,李仁发1,2   

  1. 1 湖南大学 信息科学与工程学院,湖南 长沙 410082
    2 湖南大学 嵌入式系统与网络实验室,湖南 长沙 410082
  • 出版日期:2014-08-25 发布日期:2017-06-29
  • 基金资助:
    国家自然科学基金资助项目

Semi-supervised learning by constructing query-document heterogeneous information network

Yu-feng LIU1,Ren-fa LI1,2   

  1. 1 School of Information Science and Engineering, Hunan University, Changsha 410082,China
    2 Embedded System and Networking Laboratory, Hunan University, Changsha 410082,China
  • Online:2014-08-25 Published:2017-06-29
  • Supported by:
    The National Natural Science Foundation of China

摘要:

基于图的半监督学习近年来得到了广泛的研究,然而,现有的半监督学习算法大都只能应用于同构网络。根据查询及文档自身的内容特征和点击关系构建查询—文档异构信息网络,并引入样本的判别信息强化网络结构。提出了查询—文档异构信息网络上半监督聚类的正则化框架和迭代算法,在正则化框架中,基于流形假设构造了异构信息网络上的代价函数,并得到该函数的封闭解,以此预测未标记查询和文档的类别标记。在大规模商业搜索引擎查询日志上的实验表明本方法优于传统的半监督学习方法。

关键词: 异构信息网络, 半监督学习, 信息检索, 点击日志

Abstract:

Various graph-based algorithms for semi-supervised learning have been proposed in recent literatures. How-ever, although classification on homogeneous networks has been studied for decades, classification on heterogeneous networks has not been explored until recently. The semi-supervised classification problem on query-document heteroge-neous information network which incorporate the bipartite graph with the content information from both sides is consid-ered. In order to strengthen the network structure, class information of sample nodes is introduced. A semi-supervised learning algorithm based on two frameworks including the novel graph-based regularization framework and the iterative framework is investigated. In the regularization framework, a new cost function to consider the direct relationship be-tween two entity sets and the content information from both sides which leads to a significant improvement over the baseline methods is developed. Experimental results demonstrate that proposed method achieves the best performance with consistent and promising improvements.

Key words: heterogeneous information networks, semi-supervised learning, information retrieval, click-through data

No Suggested Reading articles found!