大数据 ›› 2024, Vol. 10 ›› Issue (4): 106-120.doi: 10.11959/j.issn.2096-0271.2024052

• 研究 • 上一篇    

一种双通道半监督网络表示学习模型

杜航原1, 谢富中1, 王文剑1, 白亮2   

  1. 1 山西大学计算机与信息技术学院,山西 太原 030006
    2 山西大学智能信息处理研究所,山西 太原 030006
  • 出版日期:2024-07-01 发布日期:2024-07-01
  • 作者简介:杜航原(1985- ),男,博士,山西大学计算机与信息技术学院副教授,主要研究方向为图机器学习理论及应用。
    谢富中(1998- ),男,山西大学计算机与信息技术学院硕士生,主要研究方向为机器学习、网络数据挖掘。
    王文剑(1968- ),女,博士,山西大学计算机与信息技术学院教授、院长,主要研究方向为机器学习、数据挖掘、计算智能。
    白亮(1982- ),男,博士,山西大学智能信息处理研究所教授、所长,主要研究方向为机器学习与数据挖掘。
  • 基金资助:
    国家自然科学基金项目(U21A20513);国家自然科学基金项目(62076154);国家自然科学基金项目(62276159);国家自然科学基金项目(62276161);山西省重点研发计划项目(202302010101007);山西省重点研发计划项目(202202020101003);山西省基础研究计划项目(202303021221055)

A dual channel semi-supervised network representation learning model

Hangyuan DU1, Fuzhong XIE1, Wenjian WANG1, Liang BAI2   

  1. 1 School of Computer and Information Technology, Shanxi University, Taiyuan 030006, China
    2 Institute of Information Processing, Shanxi University, Taiyuan 030006, China
  • Online:2024-07-01 Published:2024-07-01
  • Supported by:
    The National Natural Science Foundation of China(U21A20513);The National Natural Science Foundation of China(62076154);The National Natural Science Foundation of China(62276159);The National Natural Science Foundation of China(62276161);The Key R&D Program of Shanxi Province(202302010101007);The Key R&D Program of Shanxi Province(202202020101003);The Fundamental Research Program of Shanxi Province(202303021221055)

摘要:

在半监督网络表示学习中,节点标签对于网络在不同空间中映射关系的建立具有重要指导意义。然而在很多实际任务中,可用标签信息往往比较有限或难以获取,这导致在学习网络低维表示的过程中无法提供充分有效的监督。针对这一问题,提出了一种双通道半监督网络表示学习模型,该模型以自编码器为基本框架,由自监督和半监督两个信息传递通道构成。自监督信号与标签信息分别在两个通道中对网络表示映射关系的建立提供指导,同时二者之间形成信息互补与增强。考虑到两个通道间可能存在信息冗余,在互信息视角下设计了冗余识别与消除机制。在此基础上,构造了一体化优化模型,实现自监督学习与半监督学习的协同,使学习到的网络表示更好地捕捉和保持网络的结构和特性。在真实数据集上的实验结果表明,提出的模型学习的网络表示在节点分类、聚类和可视化等任务中能够获得优于基线方法的性能。

关键词: 半监督网络表示学习, 标签信息, 自监督学习, 互信息, 图神经网络

Abstract:

In semi-supervised network representation learning, node labels play an important role in guiding the establishment of network mapping relationships among different spaces.However, in many practical tasks, the available label information is usually limited or difficult to obtain, which makes it difficult to provide sufficient and effective supervision to the process of learning low-dimensional network representations.In order to solve this problem, a dual channel semisupervised network representation learning model is proposed, which is composed of two information transmission channels, namely self-supervised and semi-supervised channels, with the framework of autoencoder.The self-supervised information and the label information provide guidance for the establishment of network representation mapping in the two channels respectively, and they form such a sense of information complementation and enhancement for the learning process.Considering the possible information redundancy between the two channels, a redundancy recognition and elimination mechanism is designed in the perspective of mutual information.On this basis, an integrated optimization model is constructed to combine the self-supervised learning and the semi-supervised learning into a collaborative mechanism, which enables the learned representations to capture and preserve the structure and characteristics of the network.Experimental results on several real datasets show that the network representations learned by the proposed model can achieve better performance than baseline methods in node classification, clustering and visualization tasks.

Key words: semi-supervised network representation learning, label information, self-supervised learning, mutual information, graph neural network

中图分类号: 

No Suggested Reading articles found!