Journal on Communications ›› 2016, Vol. 37 ›› Issue (8): 24-33.doi: 10.11959/j.issn.1000-436x.2016152

• Papers • Previous Articles     Next Articles

Feature importance analysis for spammer detection in Sina Weibo

Yu-xiang ZHANG1,2,3,Yu SUN1,Jia-hai YANG2,3,Da-lei ZHOU4,Xiang-fei MENG5,Chun-jing XIAO1   

  1. 1 College of Computer Science,Civil Aviation University of China,Tianjin 300300,China
    2 Institute for Network Sciences and Cyberspace,Tsinghua University,Beijing 100084,China
    3 Tsinghua National Laboratory for Information Science and Technology (TNList),Beijing 100084,China
    4 Institue of Network Technology,Beijing University of Posts and Telecommunications,Beijing 100876,China
    5 State Key Laboratory of Virtual Reality Technology and Systems,Beihang University,Beijing 100876,China
  • Online:2016-08-25 Published:2016-09-01
  • Supported by:
    The National Basic Research Program of China (973 Program);The National Key Tech-nology R&D Program of China;The National Natural Science Foundation of China;The National Natural Science Foundation of China;The National Natural Science Foundation of China;Ph.D.Programs Foundation of Ministry of Education of China

Abstract:

Microblog has drawn attention of not only legitimate users but also spammers.The garbage information pro-vided by spammers handicaps users' experience significantly.In order to improve the detection accuracy of spammers,most existing studies on spam focus on generating more classification features or putting forward new classifiers.Which kind of issues would be put the high priority of an enormous amount of research effort into? Are extensive features or novel classifiers better for the detection accuracy of spammers? It is tried to address these questions through combining different feature selection methods with different classifiers on a real Sina Weibo dataset.Experimental results show that selected features are more important than novel classifiers for spammer detection.In addition,features should be derived from a wide range,such as text contents,user behaviors,and social relationship,and the dimension of features should not be too high.These results will be useful in finding the breakpoint of Microblog anti-spam works in the future.

Key words: Sina Weibo, feature definition, feature selection, spammer detection

No Suggested Reading articles found!