Chinese Journal of Network and Information Security ›› 2023, Vol. 9 ›› Issue (4): 144-154.doi: 10.11959/j.issn.2096-109x.2023060

• Papers • Previous Articles    

Construction of multi-modal social media dataset for fake news detection

Guopeng GAO1, Yaodong FANG1, Yanfang HAN1, Zhenxing QIAN2, Chuan QIN1   

  1. 1 School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
    2 School of Computer Science, Fudan University, Shanghai 200433, China
  • Revised:2023-05-26 Online:2023-08-01 Published:2023-08-01
  • Supported by:
    The National Natural Science Foundation of China(U20B2051);The National Natural Science Foundation of China(62172280);The Natural Science Founda-tion of Shanghai(21ZR1444600)

Abstract:

The advent of social media has brought about significant changes in people’s lives.While social media allows for easy access and sharing of news, it has also become a breeding ground for the dissemination of fake news, posing a serious threat to social security and stability.Consequently, researchers have shifted their focus towards fake news detection.Although several deep learning-based solutions have been proposed, these methods heavily rely on large amounts of supporting data.Currently, there is a scarcity of existing datasets, particularly in Chinese, and the collected news articles are often limited to the same category.To enhance the detection of fake news, a new multi-modal fake news dataset (MFND) was developed, which comprised Chinese and English news data from ten diverse categories: politics, economy, entertainment, sports, international affairs, technology, military, education, health, and social life.The word frequencies and categories of the proposed fake news dataset were analyzed and compared with existing fake news datasets in terms of number of news, news categories, modal information and news languages.The results of the comparison demonstrate that the MFND dataset excels in terms of category information and news languages.Moreover, training and validating existing typical fake news detection methods with MFND dataset, the experimental results show an improvement of approximately 10% in model performance compared to existing mainstream fake news datasets.

Key words: social media, fake news detection, multi-modal, multi-category, dataset

CLC Number: 

No Suggested Reading articles found!