Chinese Journal of Network and Information Security ›› 2023, Vol. 9 ›› Issue (4): 53-63.doi: 10.11959/j.issn.2096-109x.2023053

• Papers • Previous Articles    

Twitter user geolocation method based on single-point toponym matching and local toponym filtering

Jin XUE1,2, Fuxiang YUAN2, Yimin LIU2, Meng ZHANG2, Yaqiong QIAO2,3, Xiangyang LUO2   

  1. 1 School of Cyber Science and Engineering, Zhengzhou University, Zhengzhou 450003, China
    2 Henan Key Laboratory of Cyberspace Situation Awareness, Zhengzhou 450001, China
    3 School of Information Engineering, North China University of Water Resources and Electric Power, Zhengzhou 450045, China
  • Revised:2023-04-18 Online:2023-08-01 Published:2023-08-01
  • Supported by:
    The National Natural Science Foundation of China(U1804263);The National Natural Science Foundation of China(U2172435);The National Natural Science Foundation of China(62272163);The National Key Research and Development Program of China(2022YFB3102900);Zhongyuan Science and Technology Innovation Leading Talent Project of China(214200510019);The Key Science and Technology Project of Henan Province(222102210036);The Henan Province Science Foundation for Youths(222300420230)

Abstract:

The availability of accurate toponyms in user tweets is crucial for geolocating Twitter users.However, existing methods for locating Twitter users often suffer from limited quantity and reliability of acquired toponyms, thus impacting the accuracy of user geolocation.To address this issue, a twitter user geolocation method based on single-point toponym matching and local toponym filtering was proposed.A toponym type discriminating algorithm based on the aggregation degree of locations of the toponym was designed.In the proposed algorithm, a single-point toponym database was generated to provide more reliable toponyms extracted from tweets.Then, according to a proposed local place name filtering algorithm based on the aggregation degree of user location, the aggregation degree of user location centered on the longitude and latitude of toponyms and the average longitude and latitude of users were calculated.This process helped in extracting local toponyms with a high aggregation degree, which enhances the reliability of toponyms used in geolocation.Finally, a user-toponym heterogeneous graph was constructed based on user social relationships and user mentions of toponyms, and users were located by graph representation learning and neural networks.A large number of user geolocation experiments were conducted based on two commonly used public datasets in this field, namely GEOTEXT and TW-US.Comparisons with nine existing typical methods for Twitter user geolocation, including HGNN, ReLP, and GCN, demonstrate that our proposed method achieves significantly higher geolocation accuracy.On the GEOTEXT dataset, the average error is reduced by 7.3~342.8 km, the median error is reduced by 2.4~354.4 km, and the accuracy of large area-level geolocation is improved by 1.3%~26.3%.On the TW-US dataset, the average error is reduced by 8.6~246.6 km, the median error is reduced by 5.7~149.7 km, and the accuracy of large area-level geolocation is improved by 1.5%~20.5%.

Key words: user geolocation, user-generated text, toponym, social media

CLC Number: 

No Suggested Reading articles found!