网络与信息安全学报 ›› 2022, Vol. 8 ›› Issue (2): 139-149.doi: 10.11959/j.issn.2096-109x.2021099

• 学术论文 • 上一篇    下一篇

基于标签传播的两阶段社区检测算法

孙学良1, 王巍2, 黄俊恒2, 辛国栋2, 王佰玲2   

  1. 1 哈尔滨工业大学计算学部,黑龙江 哈尔滨 150001
    2 哈尔滨工业大学(威海)计算机科学与技术学院,山东 威海 264209
  • 修回日期:2021-10-08 出版日期:2022-04-15 发布日期:2022-04-01
  • 作者简介:孙学良(1999− ),男,山东德州人,哈尔滨工业大学硕士生,主要研究方向为社交网络、机器学习
    王巍(1975− ),女,黑龙江齐齐哈尔人,哈尔滨工业大学(威海)助理教授,主要研究方向为社交网络,数据挖掘,机器学习和自然语言处理
    黄俊恒(1966− ),河南新乡人,哈尔滨工业大学(威海)副教授,主要研究方向为社交网络、数据挖掘、人工智能
    辛国栋(1976− ),男,山东青岛人,哈尔滨工业大学(威海)助理教授,主要研究方向为金融安全、网络安全、社交网络
    王佰玲(1978− ),男,黑龙江哈尔滨人,哈尔滨工业大学(威海)教授、博士生导师,主要研究方向为信息对抗、信息安全、信息搜索
  • 基金资助:
    国家重点研发计划(2020YFB2009502);中央高校基本科研业务费专项(HIT.NSRIF.2020098);山东省重点研发计划(2017CXGC0706)

Two-stage community detection algorithm based on label propagation

Xueliang SUN1, Wei WANG2, Junheng HUANG2, Guodong XIN2, Bailing WANG2   

  1. 1 Faculty of Computing, Harbin Institute of Technology, Harbin 150001, China
    2 School of Computer Science and Technology, Harbin Institute of Technology, Weihai 264209, China
  • Revised:2021-10-08 Online:2022-04-15 Published:2022-04-01
  • Supported by:
    The National Key R&D Program of China(2020YFB2009502);The Fundamental Research Funds for the Central Universities(HIT.NSRIF.2020098);The Key R&D Program of Shandong Province(2017CXGC0706)

摘要:

社区检测是复杂网络分析的重要研究任务之一,其检测结果有助于人们深入理解复杂网络的社区结构,同时为下游任务提供支持,如内容推荐、链路检测等。针对复杂网络的社区检测问题,提出了一种基于标签传播的两阶段社区检测算法—— TS-LPA。TS-LPA采用扩展邻域的思想来量化节点的传播能力,并在此基础上,利用节点信息和网络中边的权重等信息,提出了新的评价指标来衡量节点的中心性和节点之间的影响力。所提算法在计算节点中心性的基础上确定了节点标签更新的顺序和种子节点的选择策略,消除了算法在更新过程中的不稳定。在节点标签更新的过程中,为了更好地利用邻居节点标签类别来进行标签更新,TS-LPA 采用广度优先传播的思想,提出了第二阶段标签传播方式。当标签开始传播的时候,待更新节点的所有邻居节点都对该节点的类别标签产生影响,同时,为了减轻周围邻居节点对待更新节点的支配程度,除邻居节点的影响外,加入附近种子节点对待更新节点的影响,共同完成节点的标签更新。在不同的真实数据集和人工合成数据集的实验结果分析表明,TS-LPA 在消除随机性、表现出较强稳定性的同时,有效提高了社区检测的质量。

关键词: 社区检测, 标签传播, 节点中心性, 广度优先算法

Abstract:

Community detection is an important research topic of complex network analysis.The detection results help to understand the community structure of complex networks and provide support for downstream tasks, such as content recommendation, link detection.Considering the challenge of community detection in complex networks, a two-stage community detection algorithm based on label propagation (TS-LPA) was proposed.The TS-LPA algorithm quantified the propagation capability of nodes based on the extended neighborhood.Then a new evaluation index was proposed to measure the probability of influence between nodes using the information of nodes and the weight of edges in the network.Based on the calculation of node centrality, the algorithm determined the updating sequence of node labels and the selection strategy of seed nodes, which eliminated the instability of the algorithm in the updating process.The TS-LPA algorithm used the breadth-first propagation idea and introduced the second-stage label propagation method to improve the quality of community detection.When label began to spread, all neighboring nodes had an influence on the label of the related node.Meanwhile, the influence of neighboring seed nodes was added to complete label updating, in order to reduce the dominance of neighboring nodes on the updated node.The experimental results of different real data sets and synthetic data sets show that TS-LPA algorithm can eliminate randomness and show strong stability while effectively improving the quality of community detection.

Key words: community detection, label propagation, node centrality, breadth-first search

中图分类号: 

No Suggested Reading articles found!