深度学习在僵尸云检测中的应用研究

doi:10.11959/j.issn.1000-436x.2016228

摘要/Abstract

摘要：

僵尸云和正常云服务2种环境下的基本网络流特征差异不明显，导致传统的基于网络流特征分析法在检测僵尸云问题上失效。为此，研究利用深度学习技术解决僵尸云检测问题。首先，从网络流中提取基本特征；然后将其映射为灰度图像；最后利用卷积神经网络算法进行特征学习，提取出更加抽象的特征，用以表达网络流数据中隐藏的模式及结构关系，进而用于检测僵尸云。实验结果表明，该方法不仅能够提高检测的准确度，而且能减少检测所用时间。

关键词: 僵尸云, 云安全, 深度学习, 网络流, 特征, 卷积神经网络

Abstract:

The differences of the basic network flow characteristics between BotCloud and normal cloud services were not obvious, and this led to the inefficiency of the method in BotCloud detection based on network flow characteristics analysis. To solve this problem, a CNN(convolution neural network)-based method for detecting the BotCloud was pro-posed. First, it extracted the basic network flow characteristics from network flow data packets. Second, it mapped the basic network flow characteristics into gray image. Finally, in order to detect BotCloud, it utilized CNN algorithm to learn and extract characteristics that were more abstract to express the hidden model and structural relationship in the network data flow. The experimental results show that the proposed method can not only enhance the accuracy of detec-tion, but also greatly reduce the time required for detecting.

Key words: BotCloud, cloud security, deep learning, network flow, characteristic, CNN

寇广,汤光明,王硕,宋海涛,边媛. 深度学习在僵尸云检测中的应用研究[J]. 通信学报, 2016, 37(11): 114-128.

Guang KOU,Guang-ming TANG,Shuo WANG,Hai-tao SONG,Yuan BIAN. Using deep learning for detecting BotCloud[J]. Journal on Communications, 2016, 37(11): 114-128.

图/表 18

图1

图2

表1

图3

图4

图5

图6

表2

图7

表3

图8

表4

表5

表6

表7

图9

表8

图10

参考文献 31

[1]	江健，诸葛建伟，段海新，等．僵尸网络机理与防御技术[J]. 软件学报, 2012,23(1):82-96. JIANG J , ZHUGE J W , DUAN H X , et al. Research on botnet mecha-nisms and defenses[J]. Journal of Software, 2012,23(1):82-96.
[2]	ARTAIL H , MASTRI Z A , SRAJ M , et al. A dynamic honeypot design for intrusion detection[C]// IEEE/ACS International Conference on Pervasive Services. 2004.95-104.
[3]	诸葛建伟，韩心慧，周勇林，等． HoneyBow：一个基于高交互式蜜罐技术的恶意代码自动捕获器[J]. 通信学报, 2007,28(12):8-13. ZHUGE J W , HANG X H , ZHOU Y L , et al. HoneyBow: an auto-mated malware collection tool based on the high-interaction honeypot principle[J]. Journal on Communications, 2007,28(12):8-13.
[4]	ALHAMMADI Y , AICKELIN U . Detecting botnets through log cor-relation[C]// The Workshop on Monitoring, Attack Detection and Mi-tigation. 2010.
[5]	STINSON E , MITCHELL J C . Characterizing bots' remote control behavior[C]// The 4th international conference on Detection of Intru-sions and Malware, and Vulnerability Assessment. 2007:89-108.
[6]	LIU L , CHEN S , YAN G , et al. Bottracer: Execution-based bot-like malware detection[C]// The 11th International Conference on Informa-tion Security. 2008:97-113.
[7]	KOLBITSCH C , COMPARETTI P M , KRUEGEL C , et al. Effective and efficient malware detection at the end host[C]// The 18th Confer-ence on USENIX Security Symposium. 2009:351-366.
[8]	ROESCH M . Snort: lightweight intrusion detection for networks[C]// The 13th USENIX Conference on System Administration. 1999:229-238.
[9]	GOEBEL J , HOLZ T . Rishi: identify bot contaminated hosts by IRC nickname evaluation[C]// The first conference on First Workshop on Hot Topics in Understanding Botnets. 2007.
[10]	LIVADS C , WALSH R , LAPSLEY D , et al. Using machine learning techniques to identify botnet traffic[C]// 31th IEEE Conference on Lo-cal Computer Networks. 2006:967-974.
[11]	STRAYER W T , LAPSELY D , WALSH R , et al. Botnet detection based on network behavior[C]// 2006 ARO Workshop on Botnets. 2007:1-24.
[12]	ZENG Y , HU X , SHIN K . Detection of botnets using combined host and network-level information[C]// International Conference on De-pendable Systems and Networks (DSN). 2010:291-300.
[13]	WANG H , HOU J , GONG Z . Botnet detection architecture based on heterogeneous multi-sensor information fusion[J]. Journal of Networks, 2011,6(12):1655-1661.
[14]	GU G , ZHANG J , LEE W . BotSniffer: detecting botnet command and control channels in network traffic[C]// The 15th Annual Network and Distributed System Security Symposium. 2008:269-286.
[15]	BEIGI E B , JAZ H H STAKHANOVA N , et al. Towards effective feature selection in machine learning-based botnet detection ap-proaches[C]// International Conference on Communications and Net-work Security. 2014:247-255.
[16]	ZHAO D , TRAORE I , SAYED B , et al. Botnet detection based on traffic behavior analysis and flow intervals[J]. Computers ＆ Security, 2013,4(7):2-16.
[17]	闫健恩，袁春阳，许海燕，等．基于多维流量特征的 IRC 僵尸网络频道检测[J]. 通信学报, 2013,34(10):49-64. YAN J E , YUAN C Y , XU H Y , et al. Method of detecting IRC botnet based on the multi-features of traffic flow[J]. Journal on Communica-tions, 2013,34(10):49-64.
[18]	YAMAUCHI K , HORI Y , SAKURAI K , et al. Detecting HTTP-based bot-net based on characteristic of the C＆C session using by SVM[C]// 8th Asia Joint Conference on Information Security. 2013:63-68.
[19]	BADIS H , DOYEN G , KHATOUN R . Toward a source detection of botclouds: a PCA-based approach[C]// International Conference on Au-tonomous Infrastructure, Management, and Security. 2014:105-117.
[20]	TULASIRAM N , ANUSHUA K , BHANU SMS , et al. An extrusion detection system against botclouds[C]// Seventh International Confer-ence on Communication Networks (ICCN-2013). 2013:207-215.
[21]	BADIS H , DOYEN G , KHATOUN R . A collaborative approach for a source based detection of botclouds[C]// International Symposium on Integrated Network Management. 2015:906-909.
[22]	JADHAV S , DUTIA S , CALANGUTKAR K , et al. Cloud-based android botnet malware detection system[C]// 17th International Con-ference on Advanced Communication Technology. 2015:347-352.
[23]	HINTION G E , SALAKHUTDINOV R R . Reducing the dimensional-ity of data with neural networks[J]. Science, 2006,313(28):504-507.
[24]	TAN Z Y . Detection of denial-of-service attacks based on computer vision techniques[D]. Sydney: University of Technology, 2013.
[25]	FANG Z J , FEI F C , FANG Y M , et al. Abnormal event detection in crowded scenes based on deep learning[J]. Multimedia Tools ＆ Ap-plications, 2016:1-23.
[26]	YUAN Z L , LU Y Q , XUE Y B . Droid detector: Android malware characterization and detection using deep learning[J]. Tsinghua Sci-ence ＆ Technology, 2016,21(1):114-123.
[27]	WANG Y , CAI W D , WEI P C . A deep learning approach for detecting malicious javascript code[J]. Security ＆ Communication Networks, 2016,51(8):28656-28667.
[28]	韩晓光，曲武，姚宣霞，等．基于纹理指纹的恶意代码变种检测方法研究[J]. 通信学报, 2014,35(8):125-136. HAN X G , QU W , YAO X X , et al. Research on malicious code vari-ants detection based on texture fingerprint[J]. Journal on Communica-tions, 2014,35(8):125-136.
[29]	LECUN Y , BOTTOU L , BENGIO Y , et al. Gradient-based learning applied to document recognition[C]// The IEEE. 1998:1-46.
[30]	敖道敢 . 无监督特征学习结合神经网络应用于图像识别[D]. 广州：华南理工大学, 2014. AO D G . Integration of unsupervised feature learning and neural net-works applied to image recognition[D]. Guangzhou: South China University of Technology, 2014.
[31]	JIA Y Q , SHELHAMER E , DONAHUE J , et al. Caffe: convolutional architecture for fast feature embedding[C]// The 22nd ACM interna-tional conference on Multimedia. 2014:675-678.

序号	特征	描述	类型
1	source IP	源IP地址	字符串
2	destination IP	目的IP地址	字符串
3	source port	源端口号	整型
4	destination port	目的端口号	整型
5	protocol	协议类型	字符串
6	PX (total number of packet exchanged)	数据分组的总数量	整型
7	NNP (number of null packets exchanged)	空数据分组的数量	整型
8	IOPR (ratio between the number of incoming packets over the number of outgoing packets)	进出数据分组数量的比率	浮点型
9	reconnect (number of reconnects)	重连接的数量	整型
10	duration (flow duration)	流持续的时间	浮点型
11	FPS (length of the first packet)	第一个数据分组的长度	整型
12	TBT (total number of bytes)	总共的字节数	整型
13	average bytes per packet	平均每个分组的字节数	浮点型
14	variance of bytes per packet	每个分组字节数的方差	浮点型
15	APL (average payload packet length)	平均分组长度	浮点型
16	DPL (total number of packets with the same length over the total number of packets)	相同长度的分组数量与总分组数量的比例	浮点型
17	PV (standard deviation of payload packet length)	数据分组长度的标准差	浮点型
18	BS (average bits-per-second)	平均每秒比特数	浮点型
19	AIT (average inter arrival time of packets)	数据分组到达的平均间隔	浮点型
20	PPS (average packets-per-second)	平均每秒的分组数	浮点型

编号	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
0	√				√	√	√			√	√	√	√		√	√
1	√	√				√	√	√			√	√	√	√		√
2	√	√	√				√	√	√			√		√	√	√
3		√	√	√			√	√	√				√		√	√
4			√	√	√			√	√				√	√		√
5				√	√	√			√					√	√	√

数据集类型	标签	流数量(比例)	总数
训练	正常	106 725(78.6%)	135 697
训练	攻击	28 972(21.4%)	135 697
测试	正常	12 716(69.7%)	18 194
测试	攻击	5 478(30.3%)	18 194

	C1卷积层		S2次抽样层		C3卷积层		S4次抽样层		C5全联接层
编号
编号
	卷积核	输出	采样窗口	输出	卷积核	输出	采样窗口	输出	卷积核	输出
1	6×(3×3)	6×(18×18)	2×2	6×(9×9)	16×(3×3)	16×(7×7)	2×2	16×(4×4)	80×(4×4)	80×1
2	6×(3×3)	6×(18×18)	2×2	6×(9×9)	16×(3×3)	16×(7×7)	2×2	16×(3×3)	80×(3×3)	80×1
3	6×(3×3)	6×(18×18)	2×2	6×(9×9)	16×(4×4)	16×(6×6)	2×2	16×(3×3)	80×(3×3)	80×1
4	6×(4×4)	6×(17×17)	2×2	6×(9×9)	16×(4×4)	16×(6×6)	2×2	16×(3×3)	80×(3×3)	80×1
5	6×(5×5)	6×(16×16)	2×2	6×(8×8)	16×(5×5)	16×(4×4)	2×2	16×(2×2)	80×(2×2)	80×1
6	6×(6×6)	6×(15×15)	2×2	6×(8×8)	16×(6×6)	16×(3×3)	2×2	16×(2×2)	80×(2×2)	80×1

编号	检测率	误报率	漏报率	训练时间/s	测试时间/s
1	0.925 1	0.073 9	0.072 8	1 023	63
2	0.921 6	0.077 2	0.061 4	1 064	61
3	0.936 7	0.089 1	0.070 1	1 013	68
4	0.882 3	0.121 4	0.091 4	988	64
5	0.869 2	0.118 4	0.099 5	956	67
6	0.827 1	0.136 9	0.112 8	899	72