Telecommunications Science ›› 2017, Vol. 33 ›› Issue (11): 73-82.doi: 10.11959/j.issn.1000-0801.2017313

• Research and Development • Previous Articles     Next Articles

Design and implementation of spam filtering system based on topic model

Xiaohuai KOU,Hua CHENG   

  1. College of Information Science and Engineering,East China University of Science and Technology,Shanghai 200237,China
  • Revised:2017-09-16 Online:2017-11-01 Published:2017-12-08

Abstract:

Spam filtering technology plays a key role in many areas including information security,transmission efficiency,and automatic information classification.However,the emergence of spam affects the user's sense of experience,and can cause unnecessary economic and time loss.The deficiency of spam filtering technology was researched,and a method of spam classification based on naive Bayesian was put forward based on multiple keywords.In the subject of mail,the theme model was used by LDA to get the related subject and keyword of the message,and Word2Vec was further used to search keyword synonyms and related words,extending the keyword collection.In the classification of mails,the transcendental probability of the words in the training dataset was obtained by statistical learning.Based on the extended keyword collection and its probability,the joint probability of a subject and a message was deduced by the Bayesian formula as a basis for the spam judgment.At the same time,the spam filtering system based on topic model was simple and easy to apply.By comparing experiments with other typical spam filtering method,it is proved that the method of spam classification based on theme model and the improved method based on Word2Vec can effectively improve the accuracy of spam filtering.

Key words: text classification, spam, topic model, Bayesian theory

CLC Number: 

No Suggested Reading articles found!