通信学报 ›› 2020, Vol. 41 ›› Issue (12): 72-81.doi: 10.11959/j.issn.1000-436X.2020229

• 学术论文 • 上一篇    下一篇

基于多种提及关系的社交媒体用户位置推断

乔亚琼1,2, 罗向阳1,2, 马江涛3, 李晨亮4, 张萌1,2, 李瑞祥1,2   

  1. 1 信息工程大学网络空间安全学院,河南 郑州 450001
    2 数学工程与先进计算国家重点实验室,河南 郑州 450001
    3 郑州轻工业大学计算机与通信工程学院,河南 郑州 450001
    4 武汉大学国家网络安全学院,湖北 武汉 430075
  • 修回日期:2020-07-20 出版日期:2020-12-25 发布日期:2020-12-01
  • 作者简介:乔亚琼(1981- ),女,河南开封人,信息工程大学博士生,主要研究方向为数据挖掘和社交网络分析。
    罗向阳(1978- ),男,湖北荆门人,博士,信息工程大学教授、博士生导师,主要研究方向为网络与信息安全。
    马江涛(1981- ),男,河南开封人,博士,郑州轻工业大学讲师,主要研究方向为数据挖掘和社交网络分析。
    李晨亮(1983- ),男,湖北武汉人,博士,武汉大学副教授,主要研究方向为数据挖掘、机器学习、社交网络分析、信息安全、信息检索、文本/网络挖掘和自然语言处理。
    张萌(1996- ),女,河南偃师人,信息工程大学博士生,主要研究方向为数据挖掘和社交网络分析。
    李瑞祥(1994- ),男,湖南衡阳人,信息工程大学博士生,主要研究方向为网络实体定位、数据分析和信息安全。
  • 基金资助:
    国家自然科学基金资助项目(U1804263);国家自然科学基金资助项目(U1636219);国家自然科学基金资助项目(61872287);国家自然科学基金资助项目(U1736214);国家重点研发计划基金资助项目(2016QY01W0105);国家重点研发计划基金资助项目(2016YFB0801303);中原英才计划-中原科技创新领军人才基金资助项目(1052020KJLJ0025);河南省科技创新人才计划基金资助项目(184200510018);河南省科技攻关基金资助项目(202102310237)

Social media user geolocalization based on multiple mention relationships

Yaqiong QIAO1,2, Xiangyang LUO1,2, Jiangtao MA3, Chenliang LI4, Meng ZHANG1,2, Ruixiang LI1,2   

  1. 1 Information Engineering University, Zhengzhou 450001, China
    2 State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450001, China
    3 School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450001, China
    4 School of Cyber Science and Engineering, Wuhan University, Wuhan 430075, China
  • Revised:2020-07-20 Online:2020-12-25 Published:2020-12-01
  • Supported by:
    The National Natural Science Foundation of China(U1804263);The National Natural Science Foundation of China(U1636219);The National Natural Science Foundation of China(61872287);The National Natural Science Foundation of China(U1736214);The National Key Research and Development Program of China(2016QY01W0105);The National Key Research and Development Program of China(2016YFB0801303);Zhongyuan Talents Program-Zhongyuan Science and Technology Innovation Leading Talent Project(1052020KJLJ0025);The Plan for Scientific Inno-vation Talent of Henan Province(184200510018);The Scientific and Technological Project of Henan Province(202102310237)

摘要:

针对现有基于生成文本和社交关系的联合位置推断方法对社交媒体中异质数据间的位置关联性挖掘不充分的问题,提出了一种基于多种提及关系的社交媒体用户位置推断方法。首先,综合考虑社交媒体文本中用户之间的提及关系、用户对位置指示词的提及关系和用户对地理名词的提及关系,构建包含用户、位置指示词和地理名词3种节点的异质网络;其次,基于共同提及关系提出用户-词语-位置简化算法来构建用户-位置异质网络,将位置邻近的用户更为紧密地联系起来;再次,提出有偏的随机游走算法对图中节点采样以充分探索网络结构,缓解已知位置的稀疏性问题;最后,采用基于多层感知机的神经网络分类器对用户进行位置推断。在 GEOTEXT、TW-US 和TW-WORLD这3个代表性Twitter数据集上的实验结果表明,所提方法可显著提高用户位置推断准确率。

关键词: 社交媒体, 异质网络, 用户位置推断, 提及关系

Abstract:

Aiming at the problem that the existing joint user geolocalization methods based on social media text and social relationships do not sufficiently mine the location correlation between heterogeneous data in social media, a social media user geolocalization method based on multiple mention relationships was proposed.First, a heterogeneous network was constructed by comprehensively considering the mention relationship between users, the user's mention relationship with location indicative words, and the user's mention relationship with geographic nouns.Then, a network simplification strategy was proposed to construct a user-location heterogeneous network that connects users live nearby more closely based on the common mention relationship.After that, a biased random walk algorithm was proposed for the graph node sampling to fully explore the network structure and alleviate the sparsity problem of known locations.Finally, a neural network classifier based on a multilayer perceptron was used to infer the user's location.Experimental results on three representative Twitter data sets of GEOTEXT, TW-US and TW-WORLD show that the proposed method can significantly improve the user geolocalization accuracy.

Key words: social media, heterogeneous network, user geolocalization, mention relationship

中图分类号: 

No Suggested Reading articles found!