电信科学 ›› 2020, Vol. 36 ›› Issue (12): 20-32.doi: 10.11959/j.issn.1000-0801.2020309
邬少清,董一鸿,王雄,曹燕,辛宇
修回日期:
2020-11-27
出版日期:
2020-12-20
发布日期:
2020-12-23
作者简介:
邬少清(1995- ),男,宁波大学信息科学与工程学院硕士生,主要研究方向为大数据、数据挖掘|董一鸿(1969- ),男,博士,宁波大学信息科学与工程学院教授、硕士生导师,主要研究方向为大数据、数据挖掘、人工智能|王雄(1994- ),男,宁波大学信息科学与工程学院硕士生,主要研究方向为数据挖掘|曹燕(1993- ),女,宁波大学信息科学与工程学院硕士生,主要研究方向为大数据、数据挖掘|辛宇(1987- ),男,宁波大学信息科学与工程学院博士生,主要研究方向为数据库、知识工程
基金资助:
Shaoqing WU,Yihong DONG,Xiong WANG,Yan CAO,Yu XIN
Revised:
2020-11-27
Online:
2020-12-20
Published:
2020-12-23
Supported by:
摘要:
现有的网络表示学习方法缺少对网络中隐含的深层次信息进行挖掘和利用。对网络中的潜在信息做进一步挖掘,提出了潜在的模式结构相似性,定义了网络结构间的相似度分数,用以衡量各个结构之间的相似性,使节点可以跨越不相干的顶点,获取全局结构上的高阶相似性。利用深度学习,融合多个信息源共同参与训练,弥补随机游走带来的不足,使得多个信息源信息之间紧密结合、互相补充,以达到最优的效果。实验选取Lap、DeepWalk、TADW、SDNE、CANE作为对比方法,将3个真实世界网络作为数据集来验证模型的有效性,进行节点分类和链路重构的实验。在节点分类中针对不同数据集和训练比例,性能平均提升1.7个百分点;链路重构实验中,仅需一半维度便实现了更好的性能,最后讨论了不同网络深度下模型的性能提升,通过增加模型的深度,节点分类的平均性能增加了1.1个百分点。
中图分类号:
邬少清,董一鸿,王雄,曹燕,辛宇. 基于高阶相似性的属性网络表示学习[J]. 电信科学, 2020, 36(12): 20-32.
Shaoqing WU,Yihong DONG,Xiong WANG,Yan CAO,Yu XIN. Learning attribute network algorithm based on high-order similarity[J]. Telecommunications Science, 2020, 36(12): 20-32.
表1
Cora数据集节点分类实验结果"
方法 | 30% | 40% | 50% | 60% | 70% | 80% | 90% |
Lap | 29.66% | 30.54% | 31.67% | 33.87% | 36.29% | 39.53% | 42.71% |
DeepWalk | 79.58% | 80.88% | 81.54% | 82.12% | 82.90% | 82.83% | 83.88% |
SDNE | 59.51% | 61.74% | 63.39% | 64.36% | 65.05% | 65.61% | 65.55% |
TADW | 53.49% | 56.17% | 57.92% | 59.47% | 60.29% | 60.86% | 61.72% |
CANE | 57.28% | 60.25% | 60.89% | 63.27% | 64.08% | 64.65% | 65.24% |
LANAHS_A | 81.02% | 81.27% | 82.26% | 83.02% | 83.71% | 84.12% | 84.31% |
LANAHS | 82.15% | 82.84% | 83.57% | 84.21% | 84.62% | 85.06% | 85.73% |
表2
DBLP数据集节点分类实验结果"
方法 | 30% | 40% | 50% | 60% | 70% | 80% | 90% |
Lap | 51.61% | 52.86% | 55.11% | 56.64% | 58.27% | 59.71% | 59.59% |
DeepWalk | 82.56% | 82.71% | 82.64% | 82.65% | 82.79% | 83.17% | 83.25% |
SDNE | 53.08% | 53.10% | 53.18% | 53.26% | 53.71% | 54.08% | 54.15% |
TADW | 82.50% | 82.71% | 82.91% | 83.29% | 83.29% | 84.04% | 84.04% |
CANE | 82.61% | 82.82% | 83.08% | 83.12% | 83.13% | 83.24% | 83.56% |
LANAHS_A | 83.01% | 83.24% | 83.59% | 83.62% | 83.73% | 84.25% | 84.25% |
LANAHS | 83.75% | 84.75% | 84.89% | 85.26% | 85.28% | 85.43% | 85.72% |
表3
Cora数据集链路重构实验结果"
方法 | Lap | DeepWalk | SDNE | TADW | CANE | LANAHS_A | LANAHS |
16维 | 0.959 2 | 0.995 2 | 0.652 4 | 0.569 5 | 0.602 5 | 0.985 4 | 0.996 3 |
32维 | 0.973 1 | 0.997 3 | 0.662 0 | 0.599 6 | 0.637 1 | 0.994 1 | 0.998 5 |
48维 | 0.978 6 | 0.998 1 | 0.678 9 | 0.626 0 | 0.662 1 | 0.998 4 | 0.999 0 |
64维 | 0.981 6 | 0.998 4 | 0.679 4 | 0.630 6 | 0.673 6 | 0.999 0 | 0.999 0 |
80维 | 0.983 6 | 0.998 5 | 0.718 9 | 0.661 6 | 0.692 1 | 0.999 0 | 0.999 0 |
96维 | 0.985 8 | 0.998 5 | 0.738 1 | 0.661 8 | 0.708 2 | 0.998 9 | 0.999 0 |
112维 | 0.986 9 | 0.998 5 | 0.773 6 | 0.669 5 | 0.725 4 | 0.998 8 | 0.999 0 |
128维 | 0.988 8 | 0.998 5 | 0.887 7 | 0.679 4 | 0.736 5 | 0.998 8 | 0.999 0 |
表4
DBLP数据集链路重构实验结果"
方法 | Lap | DeepWalk | SDNE | TADW | CANE | LANAHS_A | LANAHS |
16维 | 0.910 9 | 0.994 8 | 0.519 6 | 0.918 7 | 0.905 4 | 0.982 5 | 0.996 9 |
32维 | 0.930 1 | 0.997 0 | 0.523 4 | 0.955 3 | 0.952 3 | 0.983 5 | 0.998 1 |
48维 | 0.952 1 | 0.997 8 | 0.524 2 | 0.966 6 | 0.957 4 | 0.983 7 | 0.999 2 |
64维 | 0.958 6 | 0.998 2 | 0.526 7 | 0.973 0 | 0.978 7 | 0.985 4 | 0.999 2 |
80维 | 0.965 9 | 0.998 5 | 0.531 6 | 0.976 4 | 0.979 2 | 0.993 9 | 0.999 3 |
96维 | 0.975 3 | 0.998 7 | 0.533 6 | 0.978 7 | 0.985 8 | 0.994 0 | 0.999 3 |
112维 | 0.976 8 | 0.998 8 | 0.534 1 | 0.980 5 | 0.985 9 | 0.996 5 | 0.999 3 |
128维 | 0.978 4 | 0.998 9 | 0.539 3 | 0.981 9 | 0.986 3 | 0.996 7 | 0.999 3 |
表5
Facebook数据集链路重构实验结果"
方法 | Lap | DeepWalk | SDNE | TADW | CANE | LANAHS_A | LANAHS |
16维 | 0.791 4 | 0.909 8 | 0.520 2 | 0.799 8 | 0.838 6 | 0.912 3 | 0.918 2 |
32维 | 0.811 6 | 0.927 0 | 0.570 6 | 0.828 1 | 0.857 9 | 0.917 5 | 0.940 1 |
48维 | 0.852 9 | 0.935 1 | 0.537 6 | 0.838 0 | 0.862 4 | 0.923 4 | 0.949 7 |
64维 | 0.866 3 | 0.939 8 | 0.571 5 | 0.848 2 | 0.871 2 | 0.934 9 | 0.953 5 |
80维 | 0.876 5 | 0.942 7 | 0.611 7 | 0.855 5 | 0.884 1 | 0.942 1 | 0.955 7 |
96维 | 0.883 8 | 0.944 7 | 0.567 3 | 0.862 4 | 0.889 7 | 0.945 6 | 0.955 4 |
112维 | 0.890 6 | 0.945 9 | 0.608 1 | 0.866 3 | 0.890 1 | 0.948 0 | 0.955 1 |
128维 | 0.895 8 | 0.946 9 | 0.561 1 | 0.871 7 | 0.894 2 | 0.947 3 | 0.9543 |
[14] | LI Y , WANG Y , ZHANG T ,et al. Learning network embedding with community structural information[C]// Proceedings of IJCAI. San Francisco:Morgan Kaufman, 2019: 2937-2943. |
[15] | YANG C , LIU Z , ZHAO D ,et al. Network representation learning with rich text information[C]// Proceedings of Twenty-Fourth International Joint Conference on Artificial Intelligence. San Francisco:Morgan Kaufman, 2015. |
[16] | HUANG X , LI J , HU X . Accelerated attributed network embedding[C]// Proceedings of the 2017 SIAM International Conference on Data Mining. Philadelphia:SIAM, 2017: 633-641. |
[17] | TU C , LIU H , LIU Z ,et al. Cane:context-aware network embedding for relation modeling[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg:ACL, 2017: 1722-1731. |
[18] | HUANG X , SONG Q , LI J ,et al. Exploring expert cognition for attributed network embedding[C]// Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. New York:ACM Press, 2018: 270-278. |
[19] | HOU C , HE S , TANG K . RoSANE:robust and scalable attributed network embedding for sparse networks[J]. Neurocomputing, 2020(409): 231-243. |
[20] | JIN D , LI B , JIAO P ,et al. Network-specific variational auto-encoder for embedding in attribute networks[C]// Proceedings of IJCAI. San Francisco:Morgan Kaufman, 2019: 2663-2669. |
[21] | HE Z , LIU J , LI N ,et al. Learning network-to-network model for content-rich network embedding[C]// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York:ACM Press, 2019: 1037-1045. |
[22] | LE Q , MIKOLOV T . Distributed representations of sentences and documents[C]// Proceedings of International Conference on Machine Learning. New York:ACM Press, 2014: 1188-1196. |
[23] | MCCALLUM A K , NIGAM K , RENNIE J ,et al. Automating the construction of internet portals with machine learning[J]. Information Retrieval, 2000,3(2): 127-163. |
[24] | TANG J , ZHANG J , YAO L ,et al. Arnetminer:extraction and mining of academic social networks[C]// Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM Press, 2008: 990-998. |
[25] | TRAUD A L , MUCHA P J , PORTER M A . Social structure of facebook networks[J]. Physica A:Statistical Mechanics and its Applications, 2012,391(16): 4165-4180. |
[1] | LANDE D , FU M , GUO W ,et al. Link prediction of scientific collaboration networks based on information retrieval[J]. World Wide Web, 2020(23): 2239-2257. |
[2] | FRANCOIS M , DONOVAN P , FONTAINE F . Modulating transcription factor activity:interfering with protein-protein interaction networks[C]// Proceedings of Seminars in Cell & Developmental Biology. Amsterdam:Academic Press, 2020: 12-19. |
[26] | BELKIN M , NIYOGI P . Laplacian eigenmaps and spectral techniques for embedding and clustering[C]// Proceedings of Advances in Neural Information Processing Systems. Cambridge:MIT Press, 2002: 585-591. |
[27] | PEDREGOSA F , VAROQUAUX G , GRAMFORT A ,et al. Scikit-learn:machine learning in Python[J]. Journal of Machine Learning Research, 2011,12(8): 2825-2830. |
[3] | 李晋, 杨子龙 . 微博转发网络中的节点特征和传播模型[J]. 电信科学, 2016,32(1): 40-45. |
LI J , YANG Z L . Node characteristic and propagation model in microblog forwarding network[J]. Telecommunications Science, 2016,32(1): 40-45. | |
[28] | SEN P , NAMATA G , BILGIC M ,et al. Collective classification in network data[J]. AI Magazine, 2008,29(3):93. |
[29] | HANLEY J A , MCNEIL B J . The meaning and use of the area under a receiver operating characteristic (ROC) curve[J]. Radiology, 1982,143(1): 29-36. |
[4] | TSOUMAKAS G , KATAKIS I . Multi-label classification:an overview[J]. International Journal of Data Warehousing and Mining (IJDWM), 2007,3(3): 1-13. |
[5] | 周晶, 孙喜民, 于晓昆 ,等. 知识图谱与数据应用——智能推荐[J]. 电信科学, 2019,35(8): 165-172. |
ZHOU J , SUN X M , YU X K ,et al. Knowledge graph and data application-intelligent recommendation[J]. Telecommunications Science, 2019,35(8): 165-172. | |
[6] | LIBEN-NOWELL D , KLEINBERG J . The link-prediction problem for social networks[J]. Journal of the American Society for Information Science and Technology, 2007,58(7): 1019-1031. |
[7] | PEROZZI B , AL-RFOU R , SKIENA S . Deepwalk:online learning of social representations[C]// Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM Press, 2014: 701-710. |
[8] | TANG J , QU M , WANG M ,et al. Line:Large-scale information network embedding[C]// Proceedings of the 24th International Conference on World Wide Web.[S.l.:s.n. ], 2015: 1067-1077. |
[9] | CAO S , LU W , XU Q . Grarep:learning graph representations with global structural information[C]// Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. New York:ACM Press, 2015: 891-900. |
[10] | GROVER A , LESKOVEC J . node2vec:scalable feature learning for networks[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM Press, 2016: 855-864. |
[11] | MIKOLOV T , CHEN K , CORRADO G ,et al. Efficient estimation of word representations in vector space[J]. arXiv:1301.3781, 2013 |
[12] | NEWMAN M E J . Modularity and community structure in networks[J]. Proceedings of the National Academy of Sciences, 2006,103(23): 8577-8582. |
[13] | WANG D , CUI P , ZHU W . Structural deep network embedding[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM Press, 2016: 1225-1234. |
[1] | 张洁, 赵永建, 肖冬瑞, 徐勇, 许建宏, 杨剑键. 基于意图的多模态网业协同架构研究[J]. 电信科学, 2023, 39(6): 73-84. |
[2] | 韩璐, 陈威宇, 张斐, 何建锋, 苏怀振. 差异化需求下的非关系型分布式报送信息大数据分类方法[J]. 电信科学, 2023, 39(6): 114-121. |
[3] | 张洁, 许建宏, 赵永建, 高允翔, 陈勋. 基于意图与编排的新型ICT基础架构[J]. 电信科学, 2022, 38(Z1): 184-193. |
[4] | 陈志宏, 姚元. 基于云计算的政务信息系统整合研究[J]. 电信科学, 2021, 37(9): 118-128. |
[5] | 刘国庆, 王兴起, 魏丹, 方景龙, 邵艳利. 基于最大信息系数的软件缺陷数目预测特征选择方法[J]. 电信科学, 2021, 37(5): 133-147. |
[6] | 刘志勇, 何忠江, 阮宜龙, 单俊峰, 张超. 大数据安全特征与运营实践[J]. 电信科学, 2021, 37(5): 160-169. |
[7] | 周胜利, 徐啸炀. 基于网络流量的用户网络行为被害性分析模型[J]. 电信科学, 2021, 37(2): 125-134. |
[8] | 张驰, 陆晔, 罗渝平, 孙晓凯, 祝涵珂. 一种复杂场景下的视频流人脸隐私保护技术[J]. 电信科学, 2021, 37(1): 94-101. |
[9] | 张呈宇, 李红五, 屈阳, 魏进武. 面向工业互联网的5G边缘计算发展与应用[J]. 电信科学, 2021, 37(1): 129-136. |
[10] | 卢子萌,陈佳怡,李璟,谢岳,蒋欣利,韩蕾,郭倩. 基于加权随机森林算法的空巢电力用户识别方法[J]. 电信科学, 2020, 36(8): 112-121. |
[11] | 寿震宇,钱江波,董一鸿,陈华辉. 演化森林哈希:一种无监督的在线哈希学习算法[J]. 电信科学, 2020, 36(3): 71-82. |
[12] | 宋琪杰,陈铁明,陈园,马栋捷,翁正秋. 面向物联网区块链的共识机制优化研究[J]. 电信科学, 2020, 36(2): 1-12. |
[13] | 顾飞杨, 孔莹. 基于Kudu的大数据平台实时业务处理能力提升方案[J]. 电信科学, 2019, 35(10): 151-156. |
[14] | 周晶,孙喜民,于晓昆,边新宁. 知识图谱与数据应用——智能推荐[J]. 电信科学, 2019, 35(8): 165-172. |
[15] | 赵婉芳,韩勇. 基于调度器的物联网设备能耗优化策略[J]. 电信科学, 2019, 35(3): 84-90. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
|