网络表示学习

doi:10.11959/j.issn.2096-0271.2015025

摘要/Abstract

摘要：

以Facebook、Twitter、微信和微博为代表的大型在线社会网络不断发展，产生了海量体现网络结构的数据。采用机器学习技术对网络数据进行分析的一个重要问题是如何对数据进行表示。首先介绍了网络表示学习的研究背景和相关定义。然后按照算法类别，介绍了当前5类主要的网络表示学习算法，特别地，对基于深度学习的网络表示学习技术进行了详细的介绍。之后讨论了网络表示学习的评测方法和应用场景。最后，探讨了网络表示学习的研究前景。

关键词: 网络, 表示学习, 深度学习

Abstract:

Along with the constant growth of massive online social networks such as Facebook,Twitter,Weixin and Weibo,a tremendous amount of network data sets are generated.How to represent the data is an important aspect when we apply machine learning techniques to analyze network data sets.Firstly,the research background was introduced and the definitions of NRL (network representation learning) were related.According to the categories of different algorithms,five kinds of primary NRL algorithms were introduced.Particularly,a detailed introduction to NRL algorithms based deep learning techniques was given emphatically.Then the evaluation methods and application scenarios of NRL were discussed.Finally,the research prospect of NRL in the future was discussed.

Key words: network, representation learning, deep learning

中图分类号:

TP181

陈维政, 张岩, 李晓明. 网络表示学习[J]. 大数据, 2015, 1(3): 8-22.

Weizheng Chen, Yan Zhang, Xiaoming Li. Network Representation Learning[J]. Big Data Research, 2015, 1(3): 8-22.

图/表 13

图1

图2

图3

图4

图5

图6

图7

图8

图9

图10

图11

图12

表1

参考文献 46

[1]	Mairal J , Ponce J , Sapiro G , et al. Supervised dictionary learning. Proceedings of the 2009 Conference on Neural Information Processing Systems,Vancouver,Canada, 2009:1033～1040.
[2]	Roweis S T , Saul L K . Nonlinear dimensionality reduction by locally linear embedding. Science, 2000,290(5): 2323～2326
[3]	yv?rinen A , Oja E . Independent component analysis: algorithms and applications. Neural Networks, 2000,13(4～5): 411～430
[4]	Lee H , Battle A , Rain R , et al. Efficient sparse coding algorithms. Proceedings of the 2006 Conference on Neural Information Processing Systems.Vancouver,Canada, 2006:801～808.
[5]	Lee H , Battle A , Rain R . Representation learning: a review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013,35(8): 1798～1828
[6]	Chen M , Yang Q , Tang X O . Directed graph embedding. Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI),Hyderabad,India, 2007:2707～2712
[7]	Kannan R , Vempala S . Spectral algorithms. Theoretical Computer Science, 2009,4(3～4):157～288
	Brand M , Huang K . A unifying theorem for spectral embedding and clustering. Proceedings of the 9th International Conference on Workshop on Artificial Intelligence and Statistics,Florida,USA, 2003
[9]	Le T , Lauw W . Probabilistic latent document network embedding. Proceedings of 2014 IEEE International Conference on Data Mining (ICDM),Shenzhen,China, 2014:270～279
[10]	Wojciech C , Brooks M J . A note on the locally linear embedding algorithm. International Journal of Pattern Recognition and Artificial Intelligence, 2009,23(8):1739～1752
[11]	Belkin M , Niyogi P . Laplacian eigenmaps and spectral techniques for embedding and clustering. Proceedings of Annual Conference on Neural Information Processing Systems(NIPS),Cambridge,UK, 2001:585～591
[12]	Tang L , Liu H . Relational learning via latent social dimensions. Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,Paris,France, 2009:817～826
[13]	Newman M . Modularity and community structure in networks. Proceedings of the National Academy of Sciences, 2006,103(23): 8577～8582
[14]	Zhou D Y , Huang J Y , Sch?lkopf B . Learning from labeled and unlabeled data on a directed graph. PProceedings of the 22nd International Conference on Machine Learning,Bonn,Germany, 2005:1036～1043
[15]	Jacob Y , Denoyer L , Gallinari P . Learning latent representations of nodes for classifying in heterogeneous social networks. Proceedings of the 7th ACM International Conference on Web Search and Data Mining,New York,USA, 2014:373～382
[16]	Yang J , Leskovec J . Modeling information diffusion in implicit networks. Proceedings of 2010 IEEE 10th International Conference on Data Mining (ICDM),Sydney,Australia, 2010:599～608
[17]	Bourigault S , Lagnier C , Lamprier S , et al. Learning social network embeddings for predicting information diffusion. Proceedings of the 7th ACM International Conference on Web Search and Data Mining,New York,USA, 2014:393～402.
[18]	Nallapati R , Ahmed A , Xing E , et al. Joint latent topic models for text and citations. Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,Las Vegas,USA, 2008:542～550.
[19]	Chang J , Blei D . Relational topic models for document networks. Proceedings of International Conference on Artificial Intelligence and Statistics,Clearwater Beach,Florida,USA, 2009:81～88
[20]	Iwata T , Saito K , Ueda N , et al. Parametric embedding for class visualization. Neural Computation, 2007,19(9): 2536～2556
[21]	Gopalan P , Blei D . Efficient discovery of overlapping communities in massive networks. Proceedings of the National Academy of Sciences, 2013,110(36): 14534～14539
[22]	Gopalan P , Mimno D , Gerrish S , et al. JScalable inference of overlapping communities. Proceedings of the 2012 Conference on Neural Information Processing Systems,Lake Tahoe,USA, 2012:2249～2257.
[23]	Hu Z T , Yao J J , Cui B , et al. Community level diffusion extraction. Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data,Melbourne,Victoria,Australia, 2015:1555～1569.
[24]	Kobourov S . Spring embedders and force directed graph drawing algorithms. arXiv Preprint 2012,arXiv:1201.3011,2012
[25]	Fruchterman T , Reingold E . Graph drawing by force-directed placement. Software-Practice & Experience, 1991,21(11): 1129～1164
[26]	Kamada T , Kawai S . An algorithm for drawing general undirected graphs. Information Processing Letters, 1989,31(1): 7～15
[27]	Bastian M , Heymann S , Jacomy M . Gephi:an open source software for exploring and manipulating networks. Proceedings of the 3rd International Conference on Weblogs and Social Media,San Jose,California,USA, 2009:361～362
[22]	Ellson J , Gansner E , Koutsofios L , et al. Graphviz-open source graph drawing tools. Graph Drawing.Berlin Heidelberg:Springer, 2002
[29]	Bengio Y , Goodfellow I , Courville A . Deep Learning. 2015
[30]	Perozzi B , Al-Rfou R , Skiena S . Deepwalk: online learning of social representations. Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York,USA, 2014:701～710
[31]	Tang J , Qu M , Wang M Z , et al. LINE:large-scale information network embedding. Proceedings of the 24th International Conference on World Wide Web,Florence,Italy, 2015:1067～1077
[32]	Mikolov T , Sutskever I , Chen K , et al. Distributed representations of words and phrases and their compositionality. Proceedings of the 2013 Conference onNeural Information Processing Systems,Lake Tahoe,USA, 2013:3111～3119
[33]	Mikolov T , Chen K , Corrado G , et al. Efficient estimation of word representations in vector space. arXiv Preprint arXiv:1301.3781 2013
[34]	Mikolov T , Yih W T , Zweig G . Linguistic regularities in continuous space word representations. Proceedings of the 2013 Conference on NAACL and SEM,Atlanta,USA, 2013:746～751
[35]	Hinton G E . Learning distributed representations of concepts. Proceedings of the Eighth Annual Conference on the Cognitive Science Society,Amherst,Mass,USA, 1986:1～12
[36]	Bengio Y , Ducharme R , Vincent P , et al. A neural probabilistic language model. Journal of Machine Learning Research 2003(3): 1137～1155
[37]	Morin F , Bengio Y . Proceedings of the 10th International Workshop Conference on Artificial Intelligence and Statistics. Journal of Machine Learning Research,Barbados, 2005: 246～252
[38]	Collober R , Weston J . A unified architecture for natural language processing: deep neural networks with multitask learning. Proceedings of the 25th International Conference on Machine Learning,Helsinki,Finland, 2008: 160～167
[39]	Gutmann M , Hyv?rinen A . Noise-contrastive estimation: a new estimation principle for unnormalized statistical models. Proceedings on International Conference on Artificial Intelligence and Statistics,Sardinia,Italy, 2010: 297～304
[40]	Yang C , Liu Z Y . Comprehend deepwalk as matrix factorization. arXiv Preprint arXiv:1501.00358, 2015
[41]	Goldberg Y , Levy O . Word2vec explained:deriving Mikolov et al.'s negative-sampling word-embedding methodfactorization. arXiv Preprint arXiv:1402.3722, 2014
[42]	Li Y T , Xu L L , Tian F , et al. Word embedding revisited: anew representation learning and explicit matrix factorization perspective. Proceedings of the 24th International Joint Conference on Artificial Intelligence,Buenos Aires,Argentina, 2015: 3650～3656
[43]	Yang C , Liu Z Y , Zhao D L , et al. Network representation learning with rich text information. Proceedings of the 24th International Joint Conference on Artificial Intelligence,Buenos Aires,Argentina, 2015: 2111～2117
[44]	Yu H F , Jain P , Kar P , et al. Large-scale multi-label learning with missing labels. arXiv Preprint arXiv:1307.5101, 2013
[45]	Tang J , Qu M , Mei Q Z . PTE: predictive text embedding through large-scale heterogeneous text networks. Proceedings of the 21st ACM SIGKDD Conference on knowledge Discovery and Data Mining,Sydney,Australia, 2015
[46]	Ahmed A , Shervashidze N , Narayanamurthy S , et al. Distributed large-scale natural graph factorization. Proceedings of the 22nd International Conference on World Wide Web,Rio,Brazil, 2013: 37～48

网络	节点	标签
DBLP^[15]	论文	会议
Wikipedia^[43]	维基实体	分类
Flickr^[30]	用户	兴趣组
WebKB^[6]	网页	大学

[1]	邓钇敏, 张旭龙, 司世景, 王健宗, 肖京. 虚拟人形象合成技术综述[J]. 大数据, 2023, 9(3): 114-139.
[2]	贺亚运, 彭俊清, 王健宗, 肖京. 节奏舞者：基于关键动作转换图和有条件姿态插值网络的3D舞蹈生成方法研究[J]. 大数据, 2023, 9(1): 23-37.
[3]	王子航, 禹向群, 斯洪标, 傅思敏, 张旭龙, 彭绍亮. 基于算力网络的元宇宙分层处理模型设计[J]. 大数据, 2023, 9(1): 51-62.
[4]	崔雨萌, 王靖亚, 闫尚义, 陶知众. 基于深度学习的警情记录关键信息自动抽取[J]. 大数据, 2022, 8(6): 127-142.
[5]	戴筠. 基于双曲空间图嵌入的科研热点预测[J]. 大数据, 2022, 8(6): 94-104.
[6]	李鑫辉, 申情, 张雄涛. 基于PSOFS和TSK模糊系统的不平衡心电数据分类算法[J]. 大数据, 2022, 8(5): 139-152.
[7]	易杰, 曹腾飞, 黄明峰, 黄肖翰, 张子震. 基于时间编码LSTM的高校舆情热点趋势预测研究[J]. 大数据, 2022, 8(5): 124-138.
[8]	朱智韬, 司世景, 王健宗, 肖京. 联邦推荐系统综述[J]. 大数据, 2022, 8(4): 105-132.
[9]	王杰, 张松岩, 梁吉业. 融合一致性正则与流形正则的半监督深度学习算法[J]. 大数据, 2022, 8(3): 103-114.
[10]	徐康庭, 宋威. 结合语言知识和深度学习的中文文本情感分析方法[J]. 大数据, 2022, 8(3): 115-127.
[11]	赵智韬, 赵理君, 张正, 唐娉. 基于容器云技术的典型遥感智能解译算法集成[J]. 大数据, 2022, 8(2): 58-74.
[12]	敖绍林, 秦永彬, 黄瑞章, 陈艳平, 刘丽娟, 郑庆华, 陈昌恒, 程少芬. 基于卷积神经网络的辅助分案方法研究[J]. 大数据, 2022, 8(2): 145-157.
[13]	黄辉, 秦永彬, 陈艳平, 黄瑞章. 基于BERT阅读理解框架的司法要素抽取方法[J]. 大数据, 2021, 7(6): 19-29.
[14]	孙倩, 秦永彬, 黄瑞章, 刘丽娟, 陈艳平. 结合案件要素序列的罪名预测方法[J]. 大数据, 2021, 7(6): 30-40.
[15]	辛保江, 李德文, 王兰兰. 企业电力征信大数据价值挖掘与应用[J]. 大数据, 2021, 7(6): 138-146.