恶意社交机器人检测技术研究

doi:10.11959/j.issn.1000-436x.2017275

摘要/Abstract

摘要：

攻击者利用恶意社交机器人窃取用户隐私、传播虚假消息、影响社会舆论，严重威胁了个人信息安全、社会公共安全，乃至国家安全。攻击者还在不断引入新技术实施反检测。恶意社交机器人检测成为在线社交网络安全研究的一个重点和难点。首先回顾了当前社交机器人的开发与应用现状，接着对恶意社交机器人检测问题进行了形式化定义，并分析了检测恶意社交机器人所面临的主要挑战。针对检测特征的选取问题，厘清了从静态用户特征、动态传播特征，以及关系演化特征的研究发展思路。针对检测方法问题，从基于特征、机器学习、图论以及众包4个类别总结了已有检测方案的研究思路，并剖析了几类方法在检测准确率、计算代价等方面的局限性。最后，提出了一种基于并行优化机器学习方法的恶意社交机器人检测框架。

关键词: 社交机器人, 在线社交网络, 特征工程, 机器学习, 图论, 众包, 并行化

Abstract:

The attackers use social bots to steal people’s privacy,propagate fraud messages and influent public opinions,which has brought a great threat for personal privacy security,social public security and even the security of the nation.The attackers are also introducing new techniques to carry out anti-detection.The detection of malicious social bots has become one of the most important problems in the research of online social network security and it is also a difficult problem.Firstly,development and application of social bots was reviewed and then a formulation description for the problem of detecting malicious social bots was made.Besides,main challenges in the detection of malicious social bots were analyzed.As for how to choose features for the detection,the development of choosing features that from static user features to dynamic propagation features and to relationship and evolution features were classified.As for choosing which method,approaches from the previous research based on features,machine learning,graph and crowd sourcing were summarized.Also,the limitation of these methods in detection accuracy,computation cost and so on was dissected.At last,a framework based on parallelizing machine learning methods to detect malicious social bots was proposed.

Key words: social bots, online social network, feature engineering, machine learning, graph, crowdsourcing, parallelism

中图分类号:

TP391

刘蓉,陈波,于泠,刘亚尚,陈思远. 恶意社交机器人检测技术研究[J]. 通信学报, 2017, 38(Z2): 197-210.

Rong LIU,Bo CHEN,Ling YU,Ya-shang LIU,Si-yuan CHEN. Overview of detection techniques for malicious social bots[J]. Journal on Communications, 2017, 38(Z2): 197-210.

图/表 2

参考文献 44

[1]	BOSHMAF Y , MUSLUKHOV I , BEZNOSOV K ,et al. Key challenges in defending against malicious social- bots[C]// 5th USENIX Conference on Large-scale Exploits and Emergent Threats. Berkeley,CA,USA, 2012: 12-15.
[2]	IGAL Z . Bot traffic report 2016[R]. California:Imperva Incapsula, 2017.
[3]	DEWANGAN M , KAUSHAL R . SocialBot:behavioral analysis and detection[M]// Singapore:Springer, 2016: 450-46.
[4]	DAVIS C A , VAROL O , FERRARA E ,et al. Botornot:a system to evaluate social bots[C]// 25th International Conference Companion on World Wide Web. Montreal,Quebec,Canada, 2016: 273-274.
[5]	CAROLINA ALVES de L S , NICHOLAS B . Is that social bot behaving unethically?[J]. Communications of the ACM, 2017,60(9): 29-31.
[6]	杜鸣皓 . “社交机器人”入侵[J]. 中国品牌, 2017(2): 36-41.
	DU M H . “Social bot” invades[J]. China Brand, 2017(2): 36-41.
[7]	BRITO F , PETIZ I , SALVADOR P ,et al. Detecting social-network bots based on multiscale behavioral analysis[C]// The Seventh International Conference on Emerging Security Information,Systems and Technologies. Barcelona,Spain, 2013: 81-85.
[8]	JI Y , HE Y , JIANG X ,et al. Combating the evasion mechanisms of social bots[J]. Computers ＆ Security, 2016,58(C): 230-249.
[9]	STIEGLITZ S , BRACHTEN F , BERTHELé D ,et al. Do social bots (still) act different to humans? – comparing metrics of social bots with those of humans[C]// International Conference on Social Computing and Social Media. Vancouver,BC,Canada, 2017: 379-395.
[10]	BOSHMAF Y , MUSLUKHOV I , BEZNOSOV K ,et al. Design and analysis of a social botnet[J]. Computer Networks, 2013,57(2): 556-578.
[11]	李娜, 刘洋, 宋明黎 . 社交机器人的兴起[J]. 中国计算机学会通讯, 2016,12(8): 78-86.
	LI N , LIU Y , SONG M L . The rise of social bots[J]. Communications of the CCF, 2016,12(8): 78-86.
[12]	BESSI A , FERRARA E . Social bots distort the 2016 U.S.Presidential election online discussion[J]. First Monday, 2016,21(11).
[13]	陈侃, 陈亮, 朱培栋 ,等. 基于交互行为的在线社会网络水军检测方法[J]. 通信学报, 2015,36(7): 120-128.
	CHEN K , CHEN L , ZHU P D ,et al. A method of online water army based on interactive behavior[J]. Journal on Communications, 2015,36(7): 120-128.
[14]	吕晨 . 基于用户行为的网络论坛水军检测研究与实现[D]. 成都:西南交通大学, 2017.
	LV C . Research and realization of water army detection based on user behavior[D]. Chengdu:Southwest Jiaotong University, 2017.
[15]	韩忠明, 杨珂, 谭旭升 . 利用加权用户关系图的谱分析探测大规模电子商务水军团体[J]. 计算机学报, 2017(4): 939-954.
	HAN Z M , YANG K , TAN X S . Using spectrum of wei-ghted graph of users to analyze and detect large-scale e-commerce water army[J]. Chinese Journal of Computers, 2017(4): 939-954.
[16]	陶永才, 王晓慧, 石磊 ,等. 基于用户粉丝聚类现象的微博僵尸用户检测[J]. 小型微型计算机系统, 2015,36(5): 1007-1011.
	TAO Y C , WANG X H , SHI L ,et al. Detection of zombies on microblog based on the phenomenon of user fanclustering[J]. Journal of Chinese Mini-Micro Computer Systems, 2015,36(5): 1007-1011.
[17]	CHU Z , GIANVECCHIO S , WANG H ,et al. Detecting automation of twitter accounts:are you a human,bot,or cyborg?[J]. IEEE Transactions on Dependable ＆ Secure Computing, 2012,9(6): 811-824.
[18]	VAROL O , FERRARA E , DAVIS C A ,et al. Online human-bot interactions:detection,estimation,and characterization[C]// International Conference on Web and Social Media (ICWSM).AAAI. Montreal,Canada, 2017.
[19]	俞轶楠 . 微博用户个人特征、动机、行为和微博吸引力关系的研究[D]. 北京:清华大学, 2012.
	YU Y N . Research of relationship between micro-blog users’personal characteristics,motivation,behavior and attraction on microblog[D]. Beijing:Tsinghua University, 2012.
[20]	MOTOYAMA M , LEVCHENKO K , KANICH C ,et al. Re:CAPTCHAs-understanding CAPTCHA-solving services in an economic context[C]// USENIX Security Symposium,Washington,DC,USA, 2010: 435-462.
[21]	RAMASUBRAMANIAN K , SINGH A . Machine learning using R[M]. Berkeley,CA: ApressPress, 2016: 2-3.
[22]	FERRARA E , VAROL O , DAVIS C ,et al. The rise of social bots[J]. Communications of the ACM, 2014,59(7): 96-104.
[23]	张宇翔, 孙菀, 杨家海 ,等. 新浪微博反垃圾中特征选择的重要性分析[J]. 通信学报, 2016,37(8): 24-33.
	ZHANG Y X , SUN W , YANG J H ,et al. Analysis on the importance of feature selection in anti-spam in Sina Weibo[J]. Journal on Communications, 2016,37(8): 24-33.
[24]	FAZIL M , ABULAISH M . Identifying active,reactive,and inactive targets of socialbots in Twitter[C]// International Conference on Web Intelligence. ACM, 2017: 573-580.
[25]	刘亚尚, 陈波, 朱汉 ,等. 微博僵尸粉演化特征实证研究[J]. 情报探索, 2015(12): 1-9.
	LIU Y S , CHEN B , ZHU H ,et al. An empirical study on the evolutionary characteristics of zombie on microblog[J]. Information Research, 2015(12): 1-9.
[26]	刘凡平 . 大数据时代的算法：机器学习、人工智能及其典型案例[M]. 北京: 电子工业出版社, 2017:87.
	LIU F P . Algorithms for big data age:machine learning learning,artificial intelligence,and typical cases[M]. Beijing: Publishing House of Electronics IndustryPress, 2017:87.
[27]	张艳梅, 黄莹莹, 甘世杰 ,等. 基于贝叶斯模型的微博网络水军识别算法研究[J]. 通信学报, 2017,38(1): 44-53.
	ZHANG Y M , HUANG Y Y , GAN S J ,et al. Research on identification algorithm of internet water army on microblog based on Bayesian model[J]. Journal on Communications, 2017,38(1): 44-53.
[28]	高岩 . 朴素贝叶斯分类器的改进研究[D]. 广州:华南理工大学, 2011.
	GAO Y . Research on the improvement of naive Bayesian classifier[D]. Guangzhou:South China University of Technology, 2011.
[29]	唐姜贤 . 拓展的朴素贝叶斯分类器的比较研究与优化集成[D]. 兰州:兰州财经大学, 2015.
	TANG J X . Comparative study and optimized integrati-on of extended naive bayesian classifier[D]. Lanzhou:Lanzhou University of Finance and Economics, 2015.
[30]	陆微微, 刘晶 . 一种提高K-近邻算法效率的新算法[J]. 计算机工程与应用, 2008,44(4): 163-165.
	LU W W , LIU J . A new algorithm for improving the efficiency of K-nearest neighbor algorithm[J]. Computer Engineering and Applications, 2008,44(4): 163-165.
[31]	谈磊, 连一峰, 陈恺 . 基于复合分类模型的社交网络恶意用户识别方法[J]. 计算机应用与软件, 2012,29(12): 1-5.
	TAN L , LIAN Y F , CHEN K . An identification method for malicious users in social network based on compound classification model[J]. Computer Applications and Software, 2012,29(12): 1-5.
[32]	GRIER C , THOMAS K , PAXSON V ,et al. @spam:the underground on 140 characters or less[C]// 17th ACM Conference on Computer and Communications Security. Chicago,Illinois,USA, 2010: 27-37.
[33]	IRANI D , WEBB S , PU C . Study of static classification of Social spam profiles in mySpace[J]. Cancer Cytopathology, 2013,121(10): 591-597.
[34]	LAU R Y K , LIAO S Y , KWOK C W ,et al. Text mining and probabilistic language modeling for online review spam detection[J]. ACM Transactions on Management Information Systems, 2012,2(4): 1-30.
[35]	CHINCHORE A , XU G , JIANG F . Classifying sybil in MSNs using C4.5[C]// The 3rd International Conference on Behavioral,Economic,and Socio-Cultural Computing. Durham,NC,USA, 2016: 145-150.
[36]	程晓涛 . 微博网络水军识别技术研究[D]. 郑州:中国人民解放军信息工程大学, 2015.
	CHENG X T . Research on identification technology for Internet water army on microblog[D]. Zhengzhou:PLA Information Engineering University, 2015.
[37]	张玉清, 吕少卿, 范丹 . 在线社交网络中异常账号检测方法研究[J]. 计算机学报, 2015,38(10): 2011-2027.
	ZHANG Y Q , LYU S Q , FAN D . Research on anomaly account detection method in online social network[J]. Chinese Journal of Computers, 2015,38(10): 2011-2027.
[38]	程晓涛, 刘彩霞, 刘树新 . 基于关系图特征的微博水军发现方法[J]. 自动化学报, 2015,41(9): 1533-1541.
	CHENG X T , LIU C X , LIU S X . Method for detecting water army on microblog based on the characteristics of the graph of relationship[J]. Acta Automatica Sinica, 2015,41(9): 1533-1541.
[39]	高梦超, 胡庆宝, 程耀东 ,等. 基于众包的社交网络数据采集模型设计与实现[J]. 计算机工程, 2015,41(4): 36-40.
	GAO M C , HU Q B , CHENG Y D ,et al. Design and implementation of data acquisition model in social network based on crowdsourcing[J]. Computer Engineeing, 2015,41(4): 36-40.
[40]	谭婷婷, 蔡淑琴, 胡慕海 . 众包国外研究现状[J]. 武汉理工大学学报(信息与管理工程版), 2011,33(2): 263-266.
	TAN T T , CAI S Q , HU M H . Current situation of foreign studies on crowdsourcing[J]. Journal of Wuhan University of Technology (Information ＆ Management Engineering), 2011,33(2): 263-266.
[41]	WANG G , MOHANLAL M , WILSON C ,et al. Social turing tests:Crowdsourcing sybil detection[J]. arXiv preprint arXiv:1205.3856, 2012,
[42]	陈霞, 闵华清, 宋恒杰 . 众包平台作弊用户自动识别[J]. 计算机工程, 2016,42(8): 139-145.
	CHEN X , MIN H Q , SONG H J . Automatically identify users who cheat on crowdsourcing platform[J]. Computer Engineering, 2016,42(8): 139-145.
[43]	ELOVICI Y , FIRE M , HERZBERG A ,et al. Ethical considerations when employing fake identities in online social networks for research[J]. Science and Engineering Ethics, 2014,20(4): 1027-1043.
[44]	DU X , CAI Y , WANG S ,et al. Overview of deep learning[C]// Chinese Association of Automation (YA- C),Youth Academic Annual Conference. Wuhan,China, 2017: 159-164.