基于贝叶斯模型的微博网络水军识别算法研究

doi:10.11959/j.issn.1000-436x.2017006

Abstract

Abstract:

In order to distinguish the spammers efficiently,a classifier based on the behavior characteristics was established.By analyzing the previous research,the ratio of followers,total number of blog posts,the number of friends,comprehensive quality evaluation and favorites according to latest data set,the Weibo spammers’ identification algorithm was realized based on Bayesian model and genetic algorithm.The experiment result based on the real-time data of Sina Weibo verify that the Bayesian model recognition algorithm can ensure spammers recognition accuracy without sacrificing recognition rate of non-spammers,and the proposed threshold value matrix proposed optimization can significantly improve recognition accuracy navy.

Key words: network spammer, spammer identification, Weibo, Bayesian model, genetic algorithm

CLC Number:

TP393

Yan-mei ZHANG,Ying-ying HUANG,Shi-jie GAN,Yi DING,Zhi-long MA. Weibo spammers’ identification algorithm based on Bayesian model[J]. Journal on Communications, 2017, 38(1): 44-53.

Figures/Tables 12

References 25

[1]	SRIRAM B , FUHRY D , DEMIR E ,et al. Short text classification in Twitter to improve information filtering[C]// 33rd Int’l ACM SIGIR Conf.on Research and Development in Information Retrieval (SIGIR 2010). New York:ACM Press, 2010: 841-842.
[2]	LIU B . Sentiment analysis and subjectivity[M]. Handbook of Natural Language Processing. Boca Raton: CRC PressPress, 2010: 627-666.
[3]	ZHAO Y Y , QIN B , LIU T . Sentiment analysis[J]. Journal of Software, 2010,21(8): 1834-1848.
[4]	PARAMESWARAN M , RUI H , SAYIN S . A game theoretic model and empirical analysis of spammer strategies[C]// 7th Annual Collaboration,Electronic Messaging,Anti-Abuse and Spam Conf. 2010: 1-7.
[5]	GARGARI S M , OGUDUCU S G . A novel framework for spammer detection in social bookmarking systems[C]// IEEE/ACM Int’l Conf.on Advances in Social Networks Analysis and Mining (ASONAM 2012). 2012: 827-834.
[6]	莫倩, 杨珂 . 网络水军识别研究[J]. 软件学报, 2014,25(7): 1505-1526.
	MO Q , YANG K . Overview of Web spammer detection[J]. Journal of Software, 2014,25(7): 1505-1526.
[7]	KRESTEL R , CHEN L . Using co-occurrence of tags and resources to identify spammers[C]// Discovery Challenge Workshop at the European Conf on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2008). 2008: 38-46.
[8]	GAYO-AVELLO D , BRENES D J . Overcoming spammers in Twitter—a tale of five algorithms[C]// Spanish Conf.on Information Retrieval (CERI 2010). 2010: 41-52.
[9]	韩忠明, 杨珂, 谭旭升 . 利用加权用户关系图的谱分析探测大规模电子商务水军团体[J/OL]. .
	HAN Z M , YANG K , TAN X S . Analyzing spectrum features of weight user relation graph to identify large spammer groups in online shopping websites[J/OL]. .
[10]	张良, 朱湘, 李爱平 ,等. 一种基于逻辑回归算法的水军识别方法[J]. 信息安全与技术, 2015(4): 57-62.
	ZHANG L , ZHU X , LI A P ,et al. The Spammer detection based on logistic regression[J]. Information Security and Technology, 2015 (4): 57-62.
[11]	叶施仁, 孙宁 . 基于 SVM 的新浪微博营销类水帖识别研究[J]. 湘潭大学自然科学学报, 2015,37(4): 70-74.
	YE S R , SUN N . Research on Sina microblogging marketing spam review detection based on support vector machine[J]. Natural Science Journal of Xiangtan University, 2015,37(4): 70-74.
[12]	程晓涛, 刘彩霞, 刘树新 . 基于关系图特征的微博水军发现方法[J]. 自动化学报, 2015,41(9): 1533-1541.
	CHENG X T , LIU C X , LIU S X . Graph-based features for identifying spammers in microblog networks[J]. Acta Automatica Sinica, 2015,41(9): 1533-1541.
[13]	陈侃, 陈亮, 朱培栋 ,等. 基于交互行为的在线社会网络水军检测方法[J]. 通信学报, 2015,36(7): 120-127.
	CHEN K , CHEN L , ZHU P D ,et al. Interaction based on method for spam detection in online social networks[J]. Journal on Communications, 2015,36(7): 120-127.
[14]	杨长春, 徐小松, 叶施仁 ,等. 基于文本相似度的微博网络水军发现算法[J]. 微电子学与计算机, 2014,31(3): 82-85.
	YANG C C , XU X S , YE S R ,et al. A method to find water armies in weibo based on text similarity[J]. Microelectronics ＆ Computer, 2014,31(3): 82-85.
[15]	袁旭萍, 王仁武, 翟伯荫 . 基于综合指数和熵值法的微博水军自动识别[J]. 情报杂志, 2014,33(7): 176-179.
	YUAN X P , WANG R W , ZHAI B Y . Automatic recognition of micro-blog water army based on multi-index comprehensive index method and entropy method[J]. Journal of Intelligence, 2014,33(7): 176-179.
[16]	倪平, 张玉清, 闻观行 ,等. 基于群体特征的社交僵尸网络检测方法[J]. 中国科学院大学学报, 2015,31(5): 691-700.
	NI P , ZHANG Y Q , WEN G X ,et al. Detection of socialbot networks based on population characteristics[J]. Journal of University of Chinese Academy of Sciences, 2015,31(5): 691-700.
[17]	董雨辰, 刘琰, 罗军勇 ,等. 基于支持向量机的炒作微博识别方法[J]. 计算机工程, 2015,41(3): 7-14.
	DONG Y C , LIU Y , LUO J Y ,et al. Hype microblog recognition method based on support vector machine[J]. Computer Engineering, 2015,41(3): 7-14.
[18]	韩忠明, 许峰敏, 段大高 . 面向微博的概率图水军识别模型[J]. 计算机研究与发展, 2013,50(S2): 180-186.
	HAN Z M , XU F M , DUAN D G . Probabilistic graphical model for identifying water army in microblogging system[J]. Journal of Computer Research and Development, 2013,50(S2): 180-186.
[19]	刘勘, 袁蕴英, 刘萍 . 基于随机森林分类的微博机器用户识别研究[J]. 北京大学学报, 2015,52(2): 290-300.
	LIU K , YUAN Y Y , LIU P . A Weibo bot-users indentification model based on random forest[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2015,52(2): 290-300.
[20]	STRINGHINI G , KRUEGEL C , VIGNA G . Detecting spammers on social networks[C]// 26th Annual Computer Security Applications Conf.(ACSAC 2010). 2010: 1-9.
[21]	MURMANN A J . Enhancing spammer detection in online social networks with trust-based metrics[D]. San Jose:San Jose State University, 2009.
[22]	SRIRAM B , FUHRY D , DEMIR E ,et al. Short text classification in Twitter to improve information filtering[C]// 33rd Int’l ACM SIGIR Conf.on Research and Development in Information Retrieval (SIGIR 2010). 2010: 841-842.
[23]	MOH T S , MURMANN A J . Can you judge a man by his friends? Enhancing spammer detection on the Twitter microblogging platform using friends and followers[C]// Int’l Conf.on Information Systems and Technology Management (ICISTM 2010). 2010: 210-220.
[24]	BHAT S Y , ABULAISH M . Community-based features for identifying spammers in online social networks[C]// 2013 IEEE/ACM Int’l Conf.on Advances in Social Networks Analysis and Mining (ASONAM 2013). 2013: 100-107.
[25]	潘正茂 . 不平衡数据分类问题研究[D]. 西安:西安建筑科技大学, 2012: 2-49.
	PAN Z M . Research on classification for imbalanced dataset[D]. Xi’an:Xi’an University of Architecture and Technology, 2012: 2-49.

Metrics

Recommended 0

No Suggested Reading articles found!

文献	属性	主要算法
文献[10]	URL率和文本自相似度以及好友数、粉丝数、博文数等	逻辑回归算法
文献[11]	评论时间、评论的ID、来自何客户端和粉丝数等	SVM原理、simhash算法
文献[12]	昵称、关注用户列表、微博文本、评论等	关系图结合、朴素贝叶斯、贝叶斯网络或决策树
文献[13]	设置了关注者?传播者、发布者?传播者、传播者?传播者 3 种类型来区分传播特征	决策树
文献[14]	网页特征码	文本分析、B-Tree索引
文献[15]	综合指数、信息熵值	计算综合指数、熵值法
文献[16]	注册时间、昵称、活跃时间	k-means聚类、深度优先搜索
文献[17]	发布时间、转发数、评论数、转发者ID等和用户ID等	支持向量机
文献[18]	用户活跃度、用户类别、粉丝值、好友值等	概率图
文献[19]	账户关注度、互粉比例、@比例等	决策树
文献[20]	好友请求率、URL率、文本相似性等	honey-profiles
文献[21]	互粉关注比、收藏数等	trust-based矩阵、PageRank 算法
文献[22]	消息、事件、评论等	词袋模型
文献[23]	互粉关注比、收藏数、每日增加好友数等	SVM算法、重复增量修枝算法
文献[24]	总出度（如发出的消息）、总入度、总环数等	OCTracker 算法

标识	解释
FF	粉丝关注比
AW	平均发布微博数
IF	互相关注数
QE	综合质量评价
C	收藏数
I	矩阵行数，这里表示代表属性个数
J	矩阵列数
M	非水军阈值矩阵
T	非水军概率矩阵
N	水军阈值矩阵
S	水军概率矩阵
x= {a ₁,a₂,… ,a_m?1,a _m}	未分类的数据，每个a为x的每个属性的值
B={y₁ ,y₂}	类别集，y₁表示此条数据代表非水军，y₂表示此条数据代表水军
var	阈值矩阵（通用）
population	种群矩阵，每行代表一个个体，每4列表示个体的一个属性的基因值
TP(true positive)	水军样本被预测为水军的个数
TN(true negative)	非水军样本被预测为非水军的个数
FN(false negative)	预测错误的实际水军类样本数目
FP(false positive)	预测错误的实际非水军类样本数目
acc⁺	分类器对水军类样本的分类准确率
acc^?	分类器对非水军类样本的分类准确率
g	数据集整体的平均分类性能（即总体分类准确率）
SR(spammer recall)	水军召回率
LR(legitimate recall)	非水军召回率

类别	实际水军类	实际非水军类
预测水军类	TP	FP
预测非水军类	FN	TN

Weibo spammers’ identification algorithm based on Bayesian model

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 12

References 25

Related Articles 15

Metrics

Recommended 0

[1]	Hang QIU, Hongbo TANG, Wei YOU, Yu ZHAO, Yi BAI. QGA-based network service extension algorithm in NFV [J]. Journal on Communications, 2022, 43(11): 44-52.
[2]	Yuliang CONG, Wenxi SUN, Ke XUE, Zhihong QIAN, Mianshu CHEN. Research on task offloading strategy of Internet of vehicles based on improved hybrid genetic algorithm [J]. Journal on Communications, 2022, 43(10): 77-85.
[3]	Xin SU, Haoyang XUE, Yiqing ZHOU, Jinxiu ZHU. Research on computing offloading method for maritime observation monitoring sensor network [J]. Journal on Communications, 2021, 42(5): 149-163.
[4]	Yi LU,Mengying XU,Jie ZHOU. Multi-constraints QoS routing optimization based on improved immune clonal shuffled frog leaping algorithm [J]. Journal on Communications, 2020, 41(5): 141-149.
[5]	Xinsheng WANG,Zhen BIAN. Driving behavior recognition and prediction based on Bayesian model [J]. Journal on Communications, 2018, 39(3): 108-117.
[6]	Zhen ZHANG,Peng WEI,Yufeng LI,Julong LAN,Ping XU,Bo CHEN. Feature selection algorithm based on improved particle swarm joint taboo search [J]. Journal on Communications, 2018, 39(12): 60-68.
[7]	Haoran LIU,Pan DING,Changjiang GUO,Jinfeng CHANG,Jingchuang CUI. Study on Chinese spam filtering system based on Bayes algorithm [J]. Journal on Communications, 2018, 39(12): 151-159.
[8]	Hao FENG,Lei LUO,Yong WANG,Miao YE. Multi-objective data collecting strategies for wireless sensor network based on the time variable multi-salesman problem and genetic algorithm [J]. Journal on Communications, 2017, 38(3): 112-123.
[9]	Jian WANG,Guo-sheng ZHAO,Zhi-xin LI. Research on mapping algorithm of virtual network oriented to SDN [J]. Journal on Communications, 2017, 38(10): 26-35.
[10]	Yu-xiang ZHANG,Yu SUN,Jia-hai YANG,Da-lei ZHOU,Xiang-fei MENG,Chun-jing XIAO. Feature importance analysis for spammer detection in Sina Weibo [J]. Journal on Communications, 2016, 37(8): 24-33.
[11]	Er-fu WANG,Yuan-shuo ZHENG,Xin-wu CHEN. Neural network blind equalization optimized by parallel genetic algorithm with partial elitist strategy [J]. Journal on Communications, 2016, 37(7): 193-200.
[12]	Yue SHI,song QIUXue,yong GUOShao,Feng QI. Optimal planning of optical transmission network using improved genetic algorithm [J]. Journal on Communications, 2016, 37(1): 116-122.
[13]	Zhong-zheng HE,Chao-guang MEN,Yong-jun CHEN,Xiang LI. Multi-duplication fault tolerant scheduling algorithm based on genetic algorithm in heterogeneous systems [J]. Journal on Communications, 2015, 36(7): 153-165.
[14]	Fu-you FAN,Guo-wu YANG,Qian-qi LE,Feng-mao LV,Chao ZHAO. Optimized coverage algorithm of wireless video sensor network based on quantum genetic algorithm [J]. Journal on Communications, 2015, 36(6): 94-104.
[15]	HOUWei Z,INGBo J,UANGYi-feng H,IAOXiao-xuan J,UJia-xing H,IANGWei L. CS-based data collection method for airborne clustering WSN [J]. Journal on Communications, 2015, 36(5): 130-139.