大数据 ›› 2019, Vol. 5 ›› Issue (1): 98-108.doi: 10.11959/j.issn.2096-0271.2019008

• 应用 • 上一篇    下一篇



  1. 1 国防科技大学系统工程学院,湖南 长沙 410073
    2 湖南益阳康雅医院肿瘤科,湖南 益阳 413002
    3 深圳信息职业技术学院软件工程学院,广东 深圳 518172
  • 出版日期:2019-01-01 发布日期:2019-02-01
  • 作者简介:肖时耀(1996- ),男,国防科技大学系统工程学院硕士生,主要研究领域为大数据分析。|吕慰(1985- ),男,湖南益阳康雅医院肿瘤科主治医师,主要研究领域为放射医学。|陈洒然(1989- ),男,国防科技大学系统工程学院博士生,主要研究领域为复杂网络理论、统计抽样、数据挖掘。|秦烁(1995- ),女,国防科技大学系统工程学院硕士生,主要研究领域为复杂网络传播动力学。|黄格(1991- ),女,国防科技大学系统工程学院博士生,主要研究领域为大数据、复杂网络。|蔡梦思(1992- ),女,国防科技大学系统工程学院博士生,主要研究领域为社交网络、大数据。|谭跃进(1958- ),男,国防科技大学系统工程学院教授,主要研究领域为复杂网络。|谭旭(1981- ),男,深圳信息职业技术学院软件工程学院教授,主要研究领域为智能决策、机器学习、舆情分析。|吕欣(1984- ),男,国防科技大学系统工程学院副教授,主要研究领域为大数据、复杂网络理论、应急管理。
  • 基金资助:

Analysis of HIV high-risk population characteristics with Baidu Tieba data

Shiyao XIAO1,Wei LYU2,Saran CHEN1,Shuo QIN1,Ge HUANG1,Mengsi CAI1,Yuejin TAN1,Xu TAN3,Xin LU1()   

  1. 1 School of Systems Engineering, National University of Defense Technology, Changsha 410073, China
    2 Department of Oncology, Kangya Hospital, Yiyang 413002, China
    3 School of Software Engineering, Shenzhen Institute of Information Technology, Shenzhen 518172, China
  • Online:2019-01-01 Published:2019-02-01
  • Supported by:
    The National Natural Science Foundation of China(No.91846301);The National Natural Science Foundation of China(No.71771213);The National Natural Science Foundation of China(No.71790615);The National Natural Science Foundation of China(No.71690233);The MOE(Ministry of Education in China)Liberal Arts and Social Sciences Foundation(No.17YJCZH157);The Pengcheng Scholar Funded Scheme



关键词: 在线高危人群, 男同性恋, HIV, LDA话题模型, 百度贴吧, 机器学习


The textual content and temporal pattern of online activities for users gathered in the “Fear of HIV Bar” of Baidu Tieba were analyzed. LDA topic model was used to analyze the main differences between topics discussed among HIV-infected people and non-HIV-infected people. A machine learning method based on key words was used to distinguish the sexual orientation of users who start a discussion in “Fear of HIV Bar”, and calculate the epidemic rate of HIV among groups with different sexual orientations. The techniques used in this paper can be supplemented as an important tool for high-risk populations research. In addition, this paper can be applied to assess the epidemic of HIV in populations with different sexual orientations by using machine learning technique to intelligently classify the sexual orientation of a user, which is of great significance for the public health agencies.

Key words: online high-risk populations, MSM, HIV, LDA topic model, Baidu Tieba, machine learning


No Suggested Reading articles found!