基于高阶相似性的属性网络表示学习

doi:10.11959/j.issn.1000-0801.2020309

电信科学 ›› 2020, Vol. 36 ›› Issue (12): 20-32.doi: 10.11959/j.issn.1000-0801.2020309

基于高阶相似性的属性网络表示学习

邬少清,董一鸿,王雄,曹燕,辛宇

宁波大学信息科学与工程学院，浙江宁波 315211

修回日期:2020-11-27 出版日期:2020-12-20 发布日期:2020-12-23
作者简介:邬少清（1995- ），男，宁波大学信息科学与工程学院硕士生，主要研究方向为大数据、数据挖掘|董一鸿（1969- ），男，博士，宁波大学信息科学与工程学院教授、硕士生导师，主要研究方向为大数据、数据挖掘、人工智能|王雄（1994- ），男，宁波大学信息科学与工程学院硕士生，主要研究方向为数据挖掘|曹燕（1993- ），女，宁波大学信息科学与工程学院硕士生，主要研究方向为大数据、数据挖掘|辛宇（1987- ），男，宁波大学信息科学与工程学院博士生，主要研究方向为数据库、知识工程
基金资助:
浙江省自然科学基金资助项目(LY20F020009);浙江省自然科学基金资助项目(LZ20F020001);国家自然科学基金资助项目(61602133);宁波市自然科学基金资助项目(202003N4086);宁波市自然科学基金资助项目(2019A610093)

Learning attribute network algorithm based on high-order similarity

Shaoqing WU,Yihong DONG,Xiong WANG,Yan CAO,Yu XIN

Faculty of Electrical Engineering and Computer Science,Ningbo University,Ningbo 315211,China

Revised:2020-11-27 Online:2020-12-20 Published:2020-12-23
Supported by:
Zhejiang Provincial Natural Science Foundation of China(LY20F020009);Zhejiang Provincial Natural Science Foundation of China(LZ20F020001);The National Natural Science Foundation of China(61602133);Ningbo Natural Science Foundation(202003N4086);Ningbo Natural Science Foundation(2019A610093)

摘要/Abstract

摘要：

现有的网络表示学习方法缺少对网络中隐含的深层次信息进行挖掘和利用。对网络中的潜在信息做进一步挖掘，提出了潜在的模式结构相似性，定义了网络结构间的相似度分数，用以衡量各个结构之间的相似性，使节点可以跨越不相干的顶点，获取全局结构上的高阶相似性。利用深度学习，融合多个信息源共同参与训练，弥补随机游走带来的不足，使得多个信息源信息之间紧密结合、互相补充，以达到最优的效果。实验选取Lap、DeepWalk、TADW、SDNE、CANE作为对比方法，将3个真实世界网络作为数据集来验证模型的有效性，进行节点分类和链路重构的实验。在节点分类中针对不同数据集和训练比例，性能平均提升1.7个百分点；链路重构实验中，仅需一半维度便实现了更好的性能，最后讨论了不同网络深度下模型的性能提升，通过增加模型的深度，节点分类的平均性能增加了1.1个百分点。

关键词: 网络表示学习, 图嵌入, 属性网络, 结构信息

Abstract:

Due to the lack of deep-level information mining and utilization in the existing network representation learning methods,the potential pattern structure similarity was proposed by further exploring the potential information in the network.The similarity score between network structures was defined to measure the similarity between various structures so that nodes could cross irrelevant vertices to obtain high-order similarities on the global structure.In order to achieve the best effect,deep learning was used to fuse multiple information sources to participate in training together to make up for the deficiency of random walks.In the experiment,Lap,DeepWalk,TADW,SDNE and CANE were selected as comparison methods,and three real-world networks were used as data sets to verify the validity of the model,and experiments of node classification and link reconstruction are carried out.In the node classification,the average performance is improved by 1.7 percentage points for different datasets and training proportions.In the link reconstruction experiment,only half the dimension is needed to achieve better performance.Finally,the performance improvement of the model under different network depths was discussed.By increasing the depth of the model,the average performance of node classification increased by 1.1 percentage points.

Key words: network representation learning, graph embedding, attribute network, structure information

中图分类号:

TP311

邬少清,董一鸿,王雄,曹燕,辛宇. 基于高阶相似性的属性网络表示学习[J]. 电信科学, 2020, 36(12): 20-32.

Shaoqing WU,Yihong DONG,Xiong WANG,Yan CAO,Yu XIN. Learning attribute network algorithm based on high-order similarity[J]. Telecommunications Science, 2020, 36(12): 20-32.

图/表 12

图1

图2

图3

图4

表1

表2

表3

表4

表5

图5

表6

表7

参考文献 13

[14]	LI Y , WANG Y , ZHANG T ,et al. Learning network embedding with community structural information[C]// Proceedings of IJCAI. San Francisco:Morgan Kaufman, 2019: 2937-2943.
[15]	YANG C , LIU Z , ZHAO D ,et al. Network representation learning with rich text information[C]// Proceedings of Twenty-Fourth International Joint Conference on Artificial Intelligence. San Francisco:Morgan Kaufman, 2015.
[16]	HUANG X , LI J , HU X . Accelerated attributed network embedding[C]// Proceedings of the 2017 SIAM International Conference on Data Mining. Philadelphia:SIAM, 2017: 633-641.
[17]	TU C , LIU H , LIU Z ,et al. Cane:context-aware network embedding for relation modeling[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg:ACL, 2017: 1722-1731.
[18]	HUANG X , SONG Q , LI J ,et al. Exploring expert cognition for attributed network embedding[C]// Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. New York:ACM Press, 2018: 270-278.
[19]	HOU C , HE S , TANG K . RoSANE:robust and scalable attributed network embedding for sparse networks[J]. Neurocomputing, 2020(409): 231-243.
[20]	JIN D , LI B , JIAO P ,et al. Network-specific variational auto-encoder for embedding in attribute networks[C]// Proceedings of IJCAI. San Francisco:Morgan Kaufman, 2019: 2663-2669.
[21]	HE Z , LIU J , LI N ,et al. Learning network-to-network model for content-rich network embedding[C]// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery ＆ Data Mining. New York:ACM Press, 2019: 1037-1045.
[22]	LE Q , MIKOLOV T . Distributed representations of sentences and documents[C]// Proceedings of International Conference on Machine Learning. New York:ACM Press, 2014: 1188-1196.
[23]	MCCALLUM A K , NIGAM K , RENNIE J ,et al. Automating the construction of internet portals with machine learning[J]. Information Retrieval, 2000,3(2): 127-163.
[24]	TANG J , ZHANG J , YAO L ,et al. Arnetminer:extraction and mining of academic social networks[C]// Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM Press, 2008: 990-998.
[25]	TRAUD A L , MUCHA P J , PORTER M A . Social structure of facebook networks[J]. Physica A:Statistical Mechanics and its Applications, 2012,391(16): 4165-4180.
[1]	LANDE D , FU M , GUO W ,et al. Link prediction of scientific collaboration networks based on information retrieval[J]. World Wide Web, 2020(23): 2239-2257.
[2]	FRANCOIS M , DONOVAN P , FONTAINE F . Modulating transcription factor activity:interfering with protein-protein interaction networks[C]// Proceedings of Seminars in Cell ＆ Developmental Biology. Amsterdam:Academic Press, 2020: 12-19.
[26]	BELKIN M , NIYOGI P . Laplacian eigenmaps and spectral techniques for embedding and clustering[C]// Proceedings of Advances in Neural Information Processing Systems. Cambridge:MIT Press, 2002: 585-591.
[27]	PEDREGOSA F , VAROQUAUX G , GRAMFORT A ,et al. Scikit-learn:machine learning in Python[J]. Journal of Machine Learning Research, 2011,12(8): 2825-2830.
[3]	李晋, 杨子龙 . 微博转发网络中的节点特征和传播模型[J]. 电信科学, 2016,32(1): 40-45.
	LI J , YANG Z L . Node characteristic and propagation model in microblog forwarding network[J]. Telecommunications Science, 2016,32(1): 40-45.
[28]	SEN P , NAMATA G , BILGIC M ,et al. Collective classification in network data[J]. AI Magazine, 2008,29(3):93.
[29]	HANLEY J A , MCNEIL B J . The meaning and use of the area under a receiver operating characteristic (ROC) curve[J]. Radiology, 1982,143(1): 29-36.
[4]	TSOUMAKAS G , KATAKIS I . Multi-label classification:an overview[J]. International Journal of Data Warehousing and Mining (IJDWM), 2007,3(3): 1-13.
[5]	周晶, 孙喜民, 于晓昆 ,等. 知识图谱与数据应用——智能推荐[J]. 电信科学, 2019,35(8): 165-172.
	ZHOU J , SUN X M , YU X K ,et al. Knowledge graph and data application-intelligent recommendation[J]. Telecommunications Science, 2019,35(8): 165-172.
[6]	LIBEN-NOWELL D , KLEINBERG J . The link-prediction problem for social networks[J]. Journal of the American Society for Information Science and Technology, 2007,58(7): 1019-1031.
[7]	PEROZZI B , AL-RFOU R , SKIENA S . Deepwalk:online learning of social representations[C]// Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM Press, 2014: 701-710.
[8]	TANG J , QU M , WANG M ,et al. Line:Large-scale information network embedding[C]// Proceedings of the 24th International Conference on World Wide Web.[S.l.:s.n. ], 2015: 1067-1077.
[9]	CAO S , LU W , XU Q . Grarep:learning graph representations with global structural information[C]// Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. New York:ACM Press, 2015: 891-900.
[10]	GROVER A , LESKOVEC J . node2vec:scalable feature learning for networks[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM Press, 2016: 855-864.
[11]	MIKOLOV T , CHEN K , CORRADO G ,et al. Efficient estimation of word representations in vector space[J]. arXiv:1301.3781, 2013
[12]	NEWMAN M E J . Modularity and community structure in networks[J]. Proceedings of the National Academy of Sciences, 2006,103(23): 8577-8582.
[13]	WANG D , CUI P , ZHU W . Structural deep network embedding[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM Press, 2016: 1225-1234.

方法	30%	40%	50%	60%	70%	80%	90%
Lap	29.66%	30.54%	31.67%	33.87%	36.29%	39.53%	42.71%
DeepWalk	79.58%	80.88%	81.54%	82.12%	82.90%	82.83%	83.88%
SDNE	59.51%	61.74%	63.39%	64.36%	65.05%	65.61%	65.55%
TADW	53.49%	56.17%	57.92%	59.47%	60.29%	60.86%	61.72%
CANE	57.28%	60.25%	60.89%	63.27%	64.08%	64.65%	65.24%
LANAHS_A	81.02%	81.27%	82.26%	83.02%	83.71%	84.12%	84.31%
LANAHS	82.15%	82.84%	83.57%	84.21%	84.62%	85.06%	85.73%

方法	30%	40%	50%	60%	70%	80%	90%
Lap	51.61%	52.86%	55.11%	56.64%	58.27%	59.71%	59.59%
DeepWalk	82.56%	82.71%	82.64%	82.65%	82.79%	83.17%	83.25%
SDNE	53.08%	53.10%	53.18%	53.26%	53.71%	54.08%	54.15%
TADW	82.50%	82.71%	82.91%	83.29%	83.29%	84.04%	84.04%
CANE	82.61%	82.82%	83.08%	83.12%	83.13%	83.24%	83.56%
LANAHS_A	83.01%	83.24%	83.59%	83.62%	83.73%	84.25%	84.25%
LANAHS	83.75%	84.75%	84.89%	85.26%	85.28%	85.43%	85.72%

方法	Lap	DeepWalk	SDNE	TADW	CANE	LANAHS_A	LANAHS
16维	0.959 2	0.995 2	0.652 4	0.569 5	0.602 5	0.985 4	0.996 3
32维	0.973 1	0.997 3	0.662 0	0.599 6	0.637 1	0.994 1	0.998 5
48维	0.978 6	0.998 1	0.678 9	0.626 0	0.662 1	0.998 4	0.999 0
64维	0.981 6	0.998 4	0.679 4	0.630 6	0.673 6	0.999 0	0.999 0
80维	0.983 6	0.998 5	0.718 9	0.661 6	0.692 1	0.999 0	0.999 0
96维	0.985 8	0.998 5	0.738 1	0.661 8	0.708 2	0.998 9	0.999 0
112维	0.986 9	0.998 5	0.773 6	0.669 5	0.725 4	0.998 8	0.999 0
128维	0.988 8	0.998 5	0.887 7	0.679 4	0.736 5	0.998 8	0.999 0

方法	Lap	DeepWalk	SDNE	TADW	CANE	LANAHS_A	LANAHS
16维	0.910 9	0.994 8	0.519 6	0.918 7	0.905 4	0.982 5	0.996 9
32维	0.930 1	0.997 0	0.523 4	0.955 3	0.952 3	0.983 5	0.998 1
48维	0.952 1	0.997 8	0.524 2	0.966 6	0.957 4	0.983 7	0.999 2
64维	0.958 6	0.998 2	0.526 7	0.973 0	0.978 7	0.985 4	0.999 2
80维	0.965 9	0.998 5	0.531 6	0.976 4	0.979 2	0.993 9	0.999 3
96维	0.975 3	0.998 7	0.533 6	0.978 7	0.985 8	0.994 0	0.999 3
112维	0.976 8	0.998 8	0.534 1	0.980 5	0.985 9	0.996 5	0.999 3
128维	0.978 4	0.998 9	0.539 3	0.981 9	0.986 3	0.996 7	0.999 3

方法	Lap	DeepWalk	SDNE	TADW	CANE	LANAHS_A	LANAHS
16维	0.791 4	0.909 8	0.520 2	0.799 8	0.838 6	0.912 3	0.918 2
32维	0.811 6	0.927 0	0.570 6	0.828 1	0.857 9	0.917 5	0.940 1
48维	0.852 9	0.935 1	0.537 6	0.838 0	0.862 4	0.923 4	0.949 7
64维	0.866 3	0.939 8	0.571 5	0.848 2	0.871 2	0.934 9	0.953 5
80维	0.876 5	0.942 7	0.611 7	0.855 5	0.884 1	0.942 1	0.955 7
96维	0.883 8	0.944 7	0.567 3	0.862 4	0.889 7	0.945 6	0.955 4
112维	0.890 6	0.945 9	0.608 1	0.866 3	0.890 1	0.948 0	0.955 1
128维	0.895 8	0.946 9	0.561 1	0.871 7	0.894 2	0.947 3	0.9543

基于高阶相似性的属性网络表示学习

Learning attribute network algorithm based on high-order similarity

在线阅读

PDF下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 12

参考文献 13

相关文章 15

Metrics

推荐阅读 0

信息比例	Cora	DBLP
1:1:1	83.65%	84.72%
2:2:1	83.05%	85.26%
2:1:1	84.21%	84.21%
2:1:2	83.37%	83.12%

层数	Cora	DBLP
无隐藏层	82.65%	84.02%
1层	83.57%	84.89%
2层	84.24%	85.45%
3层	84.31%	86.32%
4层	84.31%	86.15%

[1]	张洁, 赵永建, 肖冬瑞, 徐勇, 许建宏, 杨剑键. 基于意图的多模态网业协同架构研究[J]. 电信科学, 2023, 39(6): 73-84.
[2]	韩璐, 陈威宇, 张斐, 何建锋, 苏怀振. 差异化需求下的非关系型分布式报送信息大数据分类方法[J]. 电信科学, 2023, 39(6): 114-121.
[3]	张洁, 许建宏, 赵永建, 高允翔, 陈勋. 基于意图与编排的新型ICT基础架构[J]. 电信科学, 2022, 38(Z1): 184-193.
[4]	陈志宏, 姚元. 基于云计算的政务信息系统整合研究[J]. 电信科学, 2021, 37(9): 118-128.
[5]	刘国庆, 王兴起, 魏丹, 方景龙, 邵艳利. 基于最大信息系数的软件缺陷数目预测特征选择方法[J]. 电信科学, 2021, 37(5): 133-147.
[6]	刘志勇, 何忠江, 阮宜龙, 单俊峰, 张超. 大数据安全特征与运营实践[J]. 电信科学, 2021, 37(5): 160-169.
[7]	周胜利, 徐啸炀. 基于网络流量的用户网络行为被害性分析模型[J]. 电信科学, 2021, 37(2): 125-134.
[8]	张驰, 陆晔, 罗渝平, 孙晓凯, 祝涵珂. 一种复杂场景下的视频流人脸隐私保护技术[J]. 电信科学, 2021, 37(1): 94-101.
[9]	张呈宇, 李红五, 屈阳, 魏进武. 面向工业互联网的5G边缘计算发展与应用[J]. 电信科学, 2021, 37(1): 129-136.
[10]	卢子萌,陈佳怡,李璟,谢岳,蒋欣利,韩蕾,郭倩. 基于加权随机森林算法的空巢电力用户识别方法[J]. 电信科学, 2020, 36(8): 112-121.
[11]	寿震宇,钱江波,董一鸿,陈华辉. 演化森林哈希：一种无监督的在线哈希学习算法[J]. 电信科学, 2020, 36(3): 71-82.
[12]	宋琪杰,陈铁明,陈园,马栋捷,翁正秋. 面向物联网区块链的共识机制优化研究[J]. 电信科学, 2020, 36(2): 1-12.
[13]	顾飞杨, 孔莹. 基于Kudu的大数据平台实时业务处理能力提升方案[J]. 电信科学, 2019, 35(10): 151-156.
[14]	周晶,孙喜民,于晓昆,边新宁. 知识图谱与数据应用——智能推荐[J]. 电信科学, 2019, 35(8): 165-172.
[15]	赵婉芳,韩勇. 基于调度器的物联网设备能耗优化策略[J]. 电信科学, 2019, 35(3): 84-90.