基于可分离卷积的轻量级恶意域名检测模型

doi:10.11959/j.issn.2096-109x.2020084

网络与信息安全学报 ›› 2020, Vol. 6 ›› Issue (6): 112-120.doi: 10.11959/j.issn.2096-109x.2020084

基于可分离卷积的轻量级恶意域名检测模型

杨路辉¹(),白惠文¹,刘光杰^1,²,戴跃伟^1,²

¹ 南京理工大学自动化学院，江苏南京 210094
² 南京信息工程大学电子与信息工程学院，江苏南京 210044

修回日期:2020-05-21 出版日期:2020-12-15 发布日期:2020-12-16
作者简介:杨路辉（1992- ），男，江西黎川人，南京理工大学博士生，主要研究方向为网络与信息安全|白惠文（1992- ），男，吉林白山人，南京理工大学博士生，主要研究方向为网络流量分析|刘光杰（1980- ），男，江苏徐州人，博士，南京信息工程大学教授、博士生导师，主要研究方向为信息安全、多媒体系统、深度学习|戴跃伟（1962- ），男，江苏镇江人，博士，南京信息工程大学教授、博士生导师，主要研究方向为网络与多媒体信息安全
基金资助:
国家自然科学基金(U1836104)

Lightweight malicious domain name detection model based on separable convolution

Luhui YANG¹(),Huiwen BAI¹,Guangjie LIU^1,²,Yuewei DAI^1,²

¹ School of Automation,Nanjing University of Science and Technology,Nanjing 210094,China
² School of Electronic ＆Information Engineering,Nanjing University of Information Science and Technology,Nanjing 210044,China

Revised:2020-05-21 Online:2020-12-15 Published:2020-12-16
Supported by:
The National Natural Science Foundation of China(U1836104)

摘要/Abstract

摘要：

考虑到基于深度学习的恶意域名检测方法计算开销大，难以有效应用于真实网络场景域名检测实际，设计了一种基于可分离卷积的轻量级恶意域名检测算法。该模型使用可分离卷积结构，能够对卷积过程中的每一个输入通道进行深度卷积，然后对所有输出通道进行逐点卷积，在不减少卷积特征提取效果的情况下，有效减少卷积过程的参数量，实现更加快速的卷积过程并不降低模型的准确性。同时，为了减轻模型训练过程中正负样本数量不平衡与样本难易程度不平衡的情况对模型分类准确率的影响，引入了一种聚焦损失函数。所提算法在公开数据集上与 3 种典型的基于深度神经网络的检测模型进行对比，实验结果表明，算法能够达到与目前最优模型接近的检测准确率，同时能够显著提升在CPU上的模型推理速度。

关键词: 可分离卷积, 域名生成算法, 深度学习, 网络安全

Abstract:

The application of artificial intelligence in the detection of malicious domain names needs to consider both accuracy and calculation speed,which can make it closer to the actual application.Based on the above considerations,a lightweight malicious domain name detection model based on separable convolution was proposed.The model uses a separable convolution structure.It first applies depthwise convolution on every input channel,and then performs pointwise convolution on all output channels.This can effectively reduce the parameters of convolution process without impacting the effectiveness of convolution feature extraction,and realize faster convolution process while keeping high accuracy.To improve the detection accuracy considering the imbalance of the number and difficulty of positive and negative samples,a focal loss function was introduced in the training process of the model.The proposed algorithm was compared with three typical deep-learning-based detection models on a public data set.Experimental results denote that the proposed algorithm achieves detection accuracy close to the state-of-the-art model,and can significantly improve model inference speed on CPU.

Key words: separable convolution, domain generation algorithm, deep learning, cyber security

中图分类号:

TP309

杨路辉,白惠文,刘光杰,戴跃伟. 基于可分离卷积的轻量级恶意域名检测模型[J]. 网络与信息安全学报, 2020, 6(6): 112-120.

Luhui YANG,Huiwen BAI,Guangjie LIU,Yuewei DAI. Lightweight malicious domain name detection model based on separable convolution[J]. Chinese Journal of Network and Information Security, 2020, 6(6): 112-120.

图/表 10

图1

图2

图3

图4

表1

图5

表2

图6

图7

表3

参考文献 16

[1]	YADAV S , REDDY A K K , REDDY A L N ,et al. Detecting algorithmically generated malicious domain names[C]// Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement 2010. 2010: 48-61.
[2]	YADAV S , REDDY A K K , REDDY A L N ,et al. Detecting algorithmically generated domain-flux attacks with DNS traffic analysis[J]. IEEE/ACM Transactions on Networking, 2012,20(5): 1663-1677.
[3]	BILGE L , SEN S , BALZAROTTI D ,et al. EXPOSURE:a passive DNS analysis service to detect and report malicious domains[J]. ACM Transactions on Information and System Security (TISSEC), 2014,16(4): 1-28.
[4]	YANG L , ZHAI J , LIU W ,et al. Detecting word-based algorithmically generated domains using semantic analysis[J]. Symmetry, 2019,11(2):176.
[5]	SCHIAVONI S , MAGGI F , CAVALLARO L ,et al. Tracking and characterizing botnets using automatically generated domains[J]. Computer Science, 2013(2): 217-248.
[6]	SCHIAVONI S , MAGGI F , CAVALLARO L ,et al. Phoenix:DGA-based botnet tracking and intelligence[C]// International Conference on Detection of Intrusions and Malware,and Vulnerability Assessment. 2014: 192-211.
[7]	WOODBRIDGE J , ANDERSON H S , AHUJA A ,et al. Predicting domain generation algorithms with long short-term memory networks[J]. arXiv preprint arXiv:1611.00791, 2016
[8]	YU B , GRAY D L , PAN J ,et al. Inline DGA detection with deep networks[C]// 2017 IEEE International Conference on Data Mining Workshops (ICDMW). 2017.
[9]	YU B , PAN J , HU J ,et al. Character level based detection of DGA domain names[C]// 2018 International Joint Conference on Neural Networks (IJCNN). 2018: 1-8.
[10]	TRAN D , MAC H , TONG V ,et al. A LSTM based framework for handling multiclass imbalance in DGA botnet detection[J]. Neurocomputing, 2018,275: 2401-2413.
[11]	QIAO Y , ZHANG B , ZHANG W ,et al. DGA domain name classification method based on long short term memory with attention mechanism[J]. Applied Sciences, 20199:4205.
[12]	LECUN Y , BOSER B , DENKER J S ,et al. Backpropagation applied to handwritten zip code recognition[J]. Neural Computation, 1989,1(4): 541-551.
[13]	KIM Y . Convolutional neural networks for sentence classification[J]. arXiv preprint arXiv:1408.5882, 2014
[14]	ZHANG X , ZHAO J , LECUN Y . Character-level convolutional networks for text classification[C]// Advances in Neural Information Processing Systems. 2015: 649-657.
[15]	HOWARD A G , ZHU M , CHEN B ,et al. Mobilenets:efficient convolutional neural networks for mobile vision applications[J]. arXiv:1704.04861, 2017
[16]	LIN T Y , GOYAL P , GIRSHICK R ,et al. Focal loss for dense object detection[J]. IEEE Transactions on Pattern Analysis ＆ Machine Intelligence, 2017,(99): 2999-3007.

层类型	输入大小	输出大小	参数设置
输入层（Imput）	1×128	128×128
量化层（Embedding）	128×128	128×128
一维可分离卷积层（SeparableConv1D）	128×128	128×128	Kernel_size=5，Stride=1
随机失活层（Dropout）	128×128	128×128	Dropout_rate=0.5
展开层（Flatten）	128×128	1×16384
全连接层（Dense）	1×16384	1×128
随机失活层（Dropout）	1×128	1×128	Dropout_rate=0.5
全连接层（Dense）	1×128	1×1
激活层（Activation）	1×1	1×1	Activation= ‘sigmoid’

样本标签	样本描述	数量/个
合法域名	样本来自思科收集的DNS请求白名单	400 000
恶意域名	20种恶意软件生成的DGA样本，具体类型如下：Gameover、Murofet、Dircrypt、Tinba、Necurs、Ramdo、Ranbyus、Cryptolocker、Emotet、Corebot、Banjori、Qakbot、Rovnix、Kraken、Ramnit、Locky、Pykspa、Simda、Symmi、Virut	100 000

算法	召回率	平均准确率	AUC	CPU推理时间/ms
文献[8]算法	94.06%	96.61%	0.9968	1.05
文献[7]算法	95.36%	96.97%	0.9960	2.07
文献[10]算法	96.26%	97.51%	0.9971	2.09
本文算法	96.75%	97.46%	0.9971	0.60

基于可分离卷积的轻量级恶意域名检测模型

Lightweight malicious domain name detection model based on separable convolution

在线阅读

pdf下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 10

参考文献 16

相关文章 15

Metrics

推荐阅读 0

[1]	李晓萌, 郭玳豆, 卓训方, 姚恒, 秦川. 载体独立的抗屏摄信息膜叠加水印算法[J]. 网络与信息安全学报, 2023, 9(3): 135-149.
[2]	王贺立, 闫巧. 基于交易记录特征的自私挖矿检测方案[J]. 网络与信息安全学报, 2023, 9(2): 104-114.
[3]	陈训逊, 李明哲, 吕宁, 黄亮. 内禀安全：网络安全能力体系化构建方法[J]. 网络与信息安全学报, 2023, 9(1): 92-102.
[4]	谢绒娜, 马铸鸿, 李宗俞, 田野. 基于卷积神经网络的加密流量分类方法[J]. 网络与信息安全学报, 2022, 8(6): 84-91.
[5]	李东, 郝艳妮, 彭升辉, 訾瑞杰, 刘西蒙. 国家自然科学基金委员会网络安全现状与展望[J]. 网络与信息安全学报, 2022, 8(6): 92-101.
[6]	章登勇, 文凰, 李峰, 曹鹏, 向凌云, 杨高波, 丁湘陵. 基于双分支网络的图像修复取证方法[J]. 网络与信息安全学报, 2022, 8(6): 110-122.
[7]	林佳滢, 周文柏, 张卫明, 俞能海. 空域频域相结合的唇型篡改检测方法[J]. 网络与信息安全学报, 2022, 8(6): 146-155.
[8]	邢福康, 张铮, 隋然, 曲晟, 季新生. 面向进程多变体软件系统的攻击面定性建模分析[J]. 网络与信息安全学报, 2022, 8(5): 121-128.
[9]	王泽南, 李佳浩, 檀朝红, 皮德常. 面向网络安全资源池的智能服务链系统设计与分析[J]. 网络与信息安全学报, 2022, 8(4): 175-181.
[10]	王馨雅, 华光, 江昊, 张海剑. 深度学习模型的版权保护研究综述[J]. 网络与信息安全学报, 2022, 8(2): 1-14.
[11]	陈晋音, 吴长安, 郑海斌. 基于softmax激活变换的对抗防御方法[J]. 网络与信息安全学报, 2022, 8(2): 48-63.
[12]	王洋, 汤光明, 王硕, 楚江. 基于API调用管理的SDN应用层DDoS攻击防御机制[J]. 网络与信息安全学报, 2022, 8(2): 73-87.
[13]	邱宝琳, 易平. 基于多维特征图知识蒸馏的对抗样本防御方法[J]. 网络与信息安全学报, 2022, 8(2): 88-99.
[14]	李丽娟, 李曼, 毕红军, 周华春. 基于混合深度学习的多类型低速率DDoS攻击检测方法[J]. 网络与信息安全学报, 2022, 8(1): 73-85.
[15]	秦中元, 贺兆祥, 李涛, 陈立全. 基于图像重构的MNIST对抗样本防御算法[J]. 网络与信息安全学报, 2022, 8(1): 86-94.