Big Data Research ›› 2019, Vol. 5 ›› Issue (1): 98-108.doi: 10.11959/j.issn.2096-0271.2019008

Previous Articles     Next Articles

Analysis of HIV high-risk population characteristics with Baidu Tieba data

Shiyao XIAO1,Wei LYU2,Saran CHEN1,Shuo QIN1,Ge HUANG1,Mengsi CAI1,Yuejin TAN1,Xu TAN3,Xin LU1()   

  1. 1 School of Systems Engineering, National University of Defense Technology, Changsha 410073, China
    2 Department of Oncology, Kangya Hospital, Yiyang 413002, China
    3 School of Software Engineering, Shenzhen Institute of Information Technology, Shenzhen 518172, China
  • Online:2019-01-01 Published:2019-02-01
  • Supported by:
    The National Natural Science Foundation of China(No.91846301);The National Natural Science Foundation of China(No.71771213);The National Natural Science Foundation of China(No.71790615);The National Natural Science Foundation of China(No.71690233);The MOE(Ministry of Education in China)Liberal Arts and Social Sciences Foundation(No.17YJCZH157);The Pengcheng Scholar Funded Scheme

Abstract:

The textual content and temporal pattern of online activities for users gathered in the “Fear of HIV Bar” of Baidu Tieba were analyzed. LDA topic model was used to analyze the main differences between topics discussed among HIV-infected people and non-HIV-infected people. A machine learning method based on key words was used to distinguish the sexual orientation of users who start a discussion in “Fear of HIV Bar”, and calculate the epidemic rate of HIV among groups with different sexual orientations. The techniques used in this paper can be supplemented as an important tool for high-risk populations research. In addition, this paper can be applied to assess the epidemic of HIV in populations with different sexual orientations by using machine learning technique to intelligently classify the sexual orientation of a user, which is of great significance for the public health agencies.

Key words: online high-risk populations, MSM, HIV, LDA topic model, Baidu Tieba, machine learning

CLC Number: 

No Suggested Reading articles found!