Journal on Communications ›› 2017, Vol. 38 ›› Issue (1): 44-53.doi: 10.11959/j.issn.1000-436x.2017006
• Papers • Previous Articles Next Articles
Yan-mei ZHANG1,Ying-ying HUANG1,Shi-jie GAN1,Yi DING2,Zhi-long MA3
Revised:
2016-09-26
Online:
2017-01-01
Published:
2017-01-23
Supported by:
CLC Number:
Yan-mei ZHANG,Ying-ying HUANG,Shi-jie GAN,Yi DING,Zhi-long MA. Weibo spammers’ identification algorithm based on Bayesian model[J]. Journal on Communications, 2017, 38(1): 44-53.
"
文献 | 属性 | 主要算法 |
文献[ | URL率和文本自相似度以及好友数、粉丝数、博文数等 | 逻辑回归算法 |
文献[ | 评论时间、评论的ID、来自何客户端和粉丝数等 | SVM原理、simhash算法 |
文献[ | 昵称、关注用户列表、微博文本、评论等 | 关系图结合、朴素贝叶斯、贝叶斯网络或决策树 |
文献[ | 设置了关注者?传播者、发布者?传播者、传播者?传播者 3 种类型来区分传播特征 | 决策树 |
文献[ | 网页特征码 | 文本分析、B-Tree索引 |
文献[ | 综合指数、信息熵值 | 计算综合指数、熵值法 |
文献[ | 注册时间、昵称、活跃时间 | k-means聚类、深度优先搜索 |
文献[ | 发布时间、转发数、评论数、转发者ID等和用户ID等 | 支持向量机 |
文献[ | 用户活跃度、用户类别、粉丝值、好友值等 | 概率图 |
文献[ | 账户关注度、互粉比例、@比例等 | 决策树 |
文献[ | 好友请求率、URL率、文本相似性等 | honey-profiles |
文献[ | 互粉关注比、收藏数等 | trust-based矩阵、PageRank 算法 |
文献[ | 消息、事件、评论等 | 词袋模型 |
文献[ | 互粉关注比、收藏数、每日增加好友数等 | SVM算法、重复增量修枝算法 |
文献[ | 总出度(如发出的消息)、总入度、总环数等 | OCTracker 算法 |
"
标识 | 解释 |
FF | 粉丝关注比 |
AW | 平均发布微博数 |
IF | 互相关注数 |
QE | 综合质量评价 |
C | 收藏数 |
I | 矩阵行数,这里表示代表属性个数 |
J | 矩阵列数 |
M | 非水军阈值矩阵 |
T | 非水军概率矩阵 |
N | 水军阈值矩阵 |
S | 水军概率矩阵 |
x= {a 1,a2,… ,am?1,a m} | 未分类的数据,每个a为x的每个属性的值 |
B={y1 ,y2} | 类别集,y1表示此条数据代表非水军,y2表示此条数据代表水军 |
var | 阈值矩阵(通用) |
population | 种群矩阵,每行代表一个个体,每4列表示个体的一个属性的基因值 |
TP(true positive) | 水军样本被预测为水军的个数 |
TN(true negative) | 非水军样本被预测为非水军的个数 |
FN(false negative) | 预测错误的实际水军类样本数目 |
FP(false positive) | 预测错误的实际非水军类样本数目 |
acc+ | 分类器对水军类样本的分类准确率 |
acc? | 分类器对非水军类样本的分类准确率 |
g | 数据集整体的平均分类性能(即总体分类准确率) |
SR(spammer recall) | 水军召回率 |
LR(legitimate recall) | 非水军召回率 |
[1] | SRIRAM B , FUHRY D , DEMIR E ,et al. Short text classification in Twitter to improve information filtering[C]// 33rd Int’l ACM SIGIR Conf.on Research and Development in Information Retrieval (SIGIR 2010). New York:ACM Press, 2010: 841-842. |
[2] | LIU B . Sentiment analysis and subjectivity[M]. Handbook of Natural Language Processing. Boca Raton: CRC PressPress, 2010: 627-666. |
[3] | ZHAO Y Y , QIN B , LIU T . Sentiment analysis[J]. Journal of Software, 2010,21(8): 1834-1848. |
[4] | PARAMESWARAN M , RUI H , SAYIN S . A game theoretic model and empirical analysis of spammer strategies[C]// 7th Annual Collaboration,Electronic Messaging,Anti-Abuse and Spam Conf. 2010: 1-7. |
[5] | GARGARI S M , OGUDUCU S G . A novel framework for spammer detection in social bookmarking systems[C]// IEEE/ACM Int’l Conf.on Advances in Social Networks Analysis and Mining (ASONAM 2012). 2012: 827-834. |
[6] | 莫倩, 杨珂 . 网络水军识别研究[J]. 软件学报, 2014,25(7): 1505-1526. |
MO Q , YANG K . Overview of Web spammer detection[J]. Journal of Software, 2014,25(7): 1505-1526. | |
[7] | KRESTEL R , CHEN L . Using co-occurrence of tags and resources to identify spammers[C]// Discovery Challenge Workshop at the European Conf on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2008). 2008: 38-46. |
[8] | GAYO-AVELLO D , BRENES D J . Overcoming spammers in Twitter—a tale of five algorithms[C]// Spanish Conf.on Information Retrieval (CERI 2010). 2010: 41-52. |
[9] | 韩忠明, 杨珂, 谭旭升 . 利用加权用户关系图的谱分析探测大规模电子商务水军团体[J/OL]. . |
HAN Z M , YANG K , TAN X S . Analyzing spectrum features of weight user relation graph to identify large spammer groups in online shopping websites[J/OL]. . | |
[10] | 张良, 朱湘, 李爱平 ,等. 一种基于逻辑回归算法的水军识别方法[J]. 信息安全与技术, 2015(4): 57-62. |
ZHANG L , ZHU X , LI A P ,et al. The Spammer detection based on logistic regression[J]. Information Security and Technology, 2015 (4): 57-62. | |
[11] | 叶施仁, 孙宁 . 基于 SVM 的新浪微博营销类水帖识别研究[J]. 湘潭大学自然科学学报, 2015,37(4): 70-74. |
YE S R , SUN N . Research on Sina microblogging marketing spam review detection based on support vector machine[J]. Natural Science Journal of Xiangtan University, 2015,37(4): 70-74. | |
[12] | 程晓涛, 刘彩霞, 刘树新 . 基于关系图特征的微博水军发现方法[J]. 自动化学报, 2015,41(9): 1533-1541. |
CHENG X T , LIU C X , LIU S X . Graph-based features for identifying spammers in microblog networks[J]. Acta Automatica Sinica, 2015,41(9): 1533-1541. | |
[13] | 陈侃, 陈亮, 朱培栋 ,等. 基于交互行为的在线社会网络水军检测方法[J]. 通信学报, 2015,36(7): 120-127. |
CHEN K , CHEN L , ZHU P D ,et al. Interaction based on method for spam detection in online social networks[J]. Journal on Communications, 2015,36(7): 120-127. | |
[14] | 杨长春, 徐小松, 叶施仁 ,等. 基于文本相似度的微博网络水军发现算法[J]. 微电子学与计算机, 2014,31(3): 82-85. |
YANG C C , XU X S , YE S R ,et al. A method to find water armies in weibo based on text similarity[J]. Microelectronics & Computer, 2014,31(3): 82-85. | |
[15] | 袁旭萍, 王仁武, 翟伯荫 . 基于综合指数和熵值法的微博水军自动识别[J]. 情报杂志, 2014,33(7): 176-179. |
YUAN X P , WANG R W , ZHAI B Y . Automatic recognition of micro-blog water army based on multi-index comprehensive index method and entropy method[J]. Journal of Intelligence, 2014,33(7): 176-179. | |
[16] | 倪平, 张玉清, 闻观行 ,等. 基于群体特征的社交僵尸网络检测方法[J]. 中国科学院大学学报, 2015,31(5): 691-700. |
NI P , ZHANG Y Q , WEN G X ,et al. Detection of socialbot networks based on population characteristics[J]. Journal of University of Chinese Academy of Sciences, 2015,31(5): 691-700. | |
[17] | 董雨辰, 刘琰, 罗军勇 ,等. 基于支持向量机的炒作微博识别方法[J]. 计算机工程, 2015,41(3): 7-14. |
DONG Y C , LIU Y , LUO J Y ,et al. Hype microblog recognition method based on support vector machine[J]. Computer Engineering, 2015,41(3): 7-14. | |
[18] | 韩忠明, 许峰敏, 段大高 . 面向微博的概率图水军识别模型[J]. 计算机研究与发展, 2013,50(S2): 180-186. |
HAN Z M , XU F M , DUAN D G . Probabilistic graphical model for identifying water army in microblogging system[J]. Journal of Computer Research and Development, 2013,50(S2): 180-186. | |
[19] | 刘勘, 袁蕴英, 刘萍 . 基于随机森林分类的微博机器用户识别研究[J]. 北京大学学报, 2015,52(2): 290-300. |
LIU K , YUAN Y Y , LIU P . A Weibo bot-users indentification model based on random forest[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2015,52(2): 290-300. | |
[20] | STRINGHINI G , KRUEGEL C , VIGNA G . Detecting spammers on social networks[C]// 26th Annual Computer Security Applications Conf.(ACSAC 2010). 2010: 1-9. |
[21] | MURMANN A J . Enhancing spammer detection in online social networks with trust-based metrics[D]. San Jose:San Jose State University, 2009. |
[22] | SRIRAM B , FUHRY D , DEMIR E ,et al. Short text classification in Twitter to improve information filtering[C]// 33rd Int’l ACM SIGIR Conf.on Research and Development in Information Retrieval (SIGIR 2010). 2010: 841-842. |
[23] | MOH T S , MURMANN A J . Can you judge a man by his friends? Enhancing spammer detection on the Twitter microblogging platform using friends and followers[C]// Int’l Conf.on Information Systems and Technology Management (ICISTM 2010). 2010: 210-220. |
[24] | BHAT S Y , ABULAISH M . Community-based features for identifying spammers in online social networks[C]// 2013 IEEE/ACM Int’l Conf.on Advances in Social Networks Analysis and Mining (ASONAM 2013). 2013: 100-107. |
[25] | 潘正茂 . 不平衡数据分类问题研究[D]. 西安:西安建筑科技大学, 2012: 2-49. |
PAN Z M . Research on classification for imbalanced dataset[D]. Xi’an:Xi’an University of Architecture and Technology, 2012: 2-49. |
[1] | Hang QIU, Hongbo TANG, Wei YOU, Yu ZHAO, Yi BAI. QGA-based network service extension algorithm in NFV [J]. Journal on Communications, 2022, 43(11): 44-52. |
[2] | Yuliang CONG, Wenxi SUN, Ke XUE, Zhihong QIAN, Mianshu CHEN. Research on task offloading strategy of Internet of vehicles based on improved hybrid genetic algorithm [J]. Journal on Communications, 2022, 43(10): 77-85. |
[3] | Xin SU, Haoyang XUE, Yiqing ZHOU, Jinxiu ZHU. Research on computing offloading method for maritime observation monitoring sensor network [J]. Journal on Communications, 2021, 42(5): 149-163. |
[4] | Yi LU,Mengying XU,Jie ZHOU. Multi-constraints QoS routing optimization based on improved immune clonal shuffled frog leaping algorithm [J]. Journal on Communications, 2020, 41(5): 141-149. |
[5] | Xinsheng WANG,Zhen BIAN. Driving behavior recognition and prediction based on Bayesian model [J]. Journal on Communications, 2018, 39(3): 108-117. |
[6] | Zhen ZHANG,Peng WEI,Yufeng LI,Julong LAN,Ping XU,Bo CHEN. Feature selection algorithm based on improved particle swarm joint taboo search [J]. Journal on Communications, 2018, 39(12): 60-68. |
[7] | Haoran LIU,Pan DING,Changjiang GUO,Jinfeng CHANG,Jingchuang CUI. Study on Chinese spam filtering system based on Bayes algorithm [J]. Journal on Communications, 2018, 39(12): 151-159. |
[8] | Hao FENG,Lei LUO,Yong WANG,Miao YE. Multi-objective data collecting strategies for wireless sensor network based on the time variable multi-salesman problem and genetic algorithm [J]. Journal on Communications, 2017, 38(3): 112-123. |
[9] | Jian WANG,Guo-sheng ZHAO,Zhi-xin LI. Research on mapping algorithm of virtual network oriented to SDN [J]. Journal on Communications, 2017, 38(10): 26-35. |
[10] | Yu-xiang ZHANG,Yu SUN,Jia-hai YANG,Da-lei ZHOU,Xiang-fei MENG,Chun-jing XIAO. Feature importance analysis for spammer detection in Sina Weibo [J]. Journal on Communications, 2016, 37(8): 24-33. |
[11] | Er-fu WANG,Yuan-shuo ZHENG,Xin-wu CHEN. Neural network blind equalization optimized by parallel genetic algorithm with partial elitist strategy [J]. Journal on Communications, 2016, 37(7): 193-200. |
[12] | Yue SHI,song QIUXue,yong GUOShao,Feng QI. Optimal planning of optical transmission network using improved genetic algorithm [J]. Journal on Communications, 2016, 37(1): 116-122. |
[13] | Zhong-zheng HE,Chao-guang MEN,Yong-jun CHEN,Xiang LI. Multi-duplication fault tolerant scheduling algorithm based on genetic algorithm in heterogeneous systems [J]. Journal on Communications, 2015, 36(7): 153-165. |
[14] | Fu-you FAN,Guo-wu YANG,Qian-qi LE,Feng-mao LV,Chao ZHAO. Optimized coverage algorithm of wireless video sensor network based on quantum genetic algorithm [J]. Journal on Communications, 2015, 36(6): 94-104. |
[15] | HOUWei Z,INGBo J,UANGYi-feng H,IAOXiao-xuan J,UJia-xing H,IANGWei L. CS-based data collection method for airborne clustering WSN [J]. Journal on Communications, 2015, 36(5): 130-139. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||
|