通信学报 ›› 2022, Vol. 43 ›› Issue (10): 12-25.doi: 10.11959/j.issn.1000-436x.2022180
王一丰1, 郭渊博1, 陈庆礼1, 方晨1, 林韧昊2
修回日期:
2022-08-29
出版日期:
2022-10-25
发布日期:
2022-10-01
作者简介:
王一丰(1994− ),男,江苏泰兴人,信息工程大学博士生,主要研究方向为零样本学习、网络安全和入侵检测等基金资助:
Yifeng WANG1, Yuanbo GUO1, Qingli CHEN1, Chen FANG1, Renhao LIN2
Revised:
2022-08-29
Online:
2022-10-25
Published:
2022-10-01
Supported by:
摘要:
摘 要:为了应对层出不穷的未知网络威胁和日益先进的逃逸攻击,针对恶意流量分类问题,提出了一种基于对比学习的细粒度未知恶意网络流量分类方法。所提方法基于变分自编码器,分为已知和未知流量分类2个阶段,分别基于交叉熵和重构误差对已知和未知恶意流量分类。与常规方法不同,该方法在各训练阶段中加入了对比学习方法,提高对小样本和未知类恶意流量的分类性能。同时,融合了再训练和重采样等方法,进一步提高对小样本类的分类精度和泛化性能。实验结果表明,所提方法分别提高了对小样本类20.3%和对未知类恶意类9.1%的细粒度分类宏平均召回率,并且极大地缓解了部分类上的逃逸攻击。
中图分类号:
王一丰, 郭渊博, 陈庆礼, 方晨, 林韧昊. 基于对比学习的细粒度未知恶意流量分类方法[J]. 通信学报, 2022, 43(10): 12-25.
Yifeng WANG, Yuanbo GUO, Qingli CHEN, Chen FANG, Renhao LIN. Method based on contrastive learning for fine-grained unknown malicious traffic classification[J]. Journal on Communications, 2022, 43(10): 12-25.
表1
NSL-KDD数据集各细粒度类的数据分布"
粗粒度类 | 细粒度类 | 训练集数量 | 测试集数量 | 本文类别 | 本文训练采用数量 |
良性类(benign) | 良性类(benign) | 67 343 | 9 711 | 已知 | 67 343 |
apache2 | 0 | 737 | 未知 | 0 | |
back | 956 | 359 | 已知 | 956 | |
land | 18 | 7 | 未知 | 0 | |
neptune | 41 214 | 4 657 | 已知 | 41 214 | |
DoS类 | mailbomb | 0 | 293 | 未知 | 0 |
pod | 201 | 41 | 未知 | 0 | |
processtable | 0 | 685 | 未知 | 0 | |
smurf | 2 646 | 665 | 已知 | 2 646 | |
teardrop | 892 | 12 | 未知 | 0 | |
udpstorm | 0 | 2 | 未知 | 0 | |
ipsweep | 3 599 | 141 | 已知 | 3 599 | |
mscan | 0 | 996 | 未知 | 0 | |
Probe类 | nmap | 1 493 | 73 | 未知 | 0 |
portsweep | 2 931 | 157 | 已知 | 2 931 | |
saint | 0 | 319 | 未知 | 0 | |
satan | 3 633 | 735 | 已知 | 3 633 | |
buffer_overflow | 30 | 20 | 小样本 | 5 | |
httptunnel | 0 | 133 | 未知 | 0 | |
loadmodule | 9 | 2 | 未知 | 0 | |
U2R类 | perl | 3 | 2 | 未知 | 0 |
ps | 0 | 15 | 未知 | 0 | |
xterm | 0 | 13 | 未知 | 0 | |
rootkit | 10 | 13 | 小样本 | 5 | |
sqlattack | 0 | 2 | 未知 | 0 | |
worm | 0 | 2 | 未知 | 0 | |
ftp_write | 8 | 3 | 未知 | 0 | |
guess_passwd | 53 | 1 231 | 已知 | 53 | |
imap | 11 | 1 | 未知 | 0 | |
multihop | 7 | 18 | 小样本 | 5 | |
named | 0 | 17 | 未知 | 0 | |
phf | 4 | 2 | 未知 | 0 | |
R2L类 | sendmail | 0 | 14 | 未知 | 0 |
snmpgetattack | 0 | 178 | 未知 | 0 | |
snmpguess | 0 | 331 | 未知 | 0 | |
spy | 2 | 0 | 未知 | 0 | |
warezclient | 890 | 0 | 已知 | 890 | |
warezmaster | 20 | 944 | 小样本 | 5 | |
xsnoop | 0 | 4 | 未知 | 0 | |
xlock | 0 | 9 | 未知 | 0 |
[1] | SOYSAL M , SCHMIDT E G . Machine learning algorithms for accurate flow-based network traffic classification:evaluation and comparison[J]. Performance Evaluation, 2010,67(6): 451-467. |
[2] | DUSI M , GRINGOLI F , SALGARELLI L . Quantifying the accuracy of the ground truth associated with Internet traffic traces[J]. Computer Networks, 2011,55(5): 1158-1167. |
[3] | 陈明豪, 祝跃飞, 芦斌 ,等. 基于Attention-CNN的加密流量应用类型分类[J]. 计算机科学, 2021,48(4): 325-332. |
CHEN M H , ZHU Y F , LU B ,et al. Classification of application type of encrypted traffic based on attention-CNN[J]. Computer Science, 2021,48(4): 325-332. | |
[4] | CAMPFIELD M . The practical difference between known and unknown threats[J]. Computer Fraud & Security, 2021(5): 6-9. |
[5] | FRANK J . Artificial intelligence and intrusion detection:current and future directions[J]. Computers & Security, 1995,14(1): 31. |
[6] | TING C , FIELD R , FISHER A ,et al. Compression analytics for classification and anomaly detection within network communication[J]. IEEE Transactions on Information Forensics and Security, 2019,14(5): 1366-1376. |
[7] | 曾勇, 吴正远, 董丽华 ,等. 加密流量中的恶意流量分类技术[J]. 西安电子科技大学学报, 2021,48(3): 170-187. |
ZENG Y , WU Z Y , DONG L H ,et al. Research on malicious traffic identification technology in encrypted traffic[J]. Journal of Xidian University, 2021,48(3): 170-187. | |
[8] | YANG J , CHEN X , CHEN S W ,et al. Conditional variational auto-encoder and extreme value theory aided two-stage learning approach for intelligent fine-grained known/unknown intrusion detection[J]. IEEE Transactions on Information Forensics and Security, 2021,16: 3538-3553. |
[9] | AKHTAR N , MIAN A . Threat of adversarial attacks on deep learning in computer vision:a survey[J]. IEEE Access, 2018,6: 14410-14430. |
[10] | 韩宇, 方滨兴, 崔翔 ,等. StealthyFlow:一种对抗条件下恶意代码动态流量伪装框架[J]. 计算机学报, 2021,44(5): 948-962. |
HAN Y , FANG B X , CUI X ,et al. StealthyFlow:a framework for malware dynamic traffic camouflaging in adversarial environment[J]. Chinese Journal of Computers, 2021,44(5): 948-962. | |
[11] | LIU J Y , ZENG Y Z , SHI J Y ,et al. MalDetect:a structure of encrypted malware traffic detection[J]. Computers,Materials & Continua, 2019,60(2): 721-739. |
[12] | 胡永进, 郭渊博, 马骏 ,等. 基于对抗样本的网络欺骗流量生成方法[J]. 通信学报, 2020,41(9): 59-70. |
HU Y J , GUO Y B , MA J ,et al. Method to generate cyber deception traffic based on adversarial sample[J]. Journal on Communications, 2020,41(9): 59-70. | |
[13] | DIXON L , RISTENPART T , SHRIMPTON T . Network traffic obfuscation and automated Internet censorship[J]. IEEE Security & Privacy, 2016,14(6): 43-53. |
[14] | 姚忠将, 葛敬国, 张潇丹 ,等. 流量混淆技术及相应分类、追踪技术研究综述[J]. 软件学报, 2018,29(10): 3205-3222. |
YAO Z J , GE J G , ZHANG X D ,et al. Research review on traffic obfuscation and its corresponding identification and tracking technologies[J]. Journal of Software, 2018,29(10): 3205-3222. | |
[15] | HADSELL R , CHOPRA S , LECUN Y . Dimensionality reduction by learning an invariant mapping[C]// Proceedings of 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2006: 1735-1742. |
[16] | SOHN K , YAN X C , LEE H . Learning structured output representation using deep conditional generative models[C]// Proceedings of the 28th International Conference on Neural Information Processing Systems. Massachusetts:MIT Press, 2015: 3483-3491. |
[17] | HAAN L D , FERREIRA A . Extreme value theory:an introduction[M]. New York: Springer, 2006. |
[18] | DONG B , WANG X . Comparison deep learning method to traditional methods using for network intrusion detection[C]// Proceedings of 2016 8th IEEE International Conference on Communication Software and Networks. Piscataway:IEEE Press, 2016: 581-585. |
[19] | 潘吴斌, 程光, 郭晓军 ,等. 网络加密流量分类研究综述及展望[J]. 通信学报, 2016,37(9): 154-167. |
PAN W B , CHENG G , GUO X J ,et al. Review and perspective on encrypted traffic identification research[J]. Journal on Communications, 2016,37(9): 154-167. | |
[20] | WANG S S , YAN Q B , CHEN Z X ,et al. Detecting android malware leveraging text semantics of network flows[J]. IEEE Transactions on Information Forensics and Security, 2018,13(5): 1096-1109. |
[21] | YAO Z J , GE J G , WU Y L ,et al. Encrypted traffic classification based on Gaussian mixture models and hidden Markov models[J]. Journal of Network and Computer Applications, 2020,166:102711. |
[22] | TAYLOR V F , SPOLAOR R , CONTI M ,et al. AppScanner:automatic fingerprinting of smartphone APPs from encrypted network traffic[C]// Proceedings of 2016 IEEE European Symposium on Security and Privacy (EuroS&P). Piscataway:IEEE Press, 2016: 439-454. |
[23] | BAZUHAIR W , LEE W . Detecting malign encrypted network traffic using perlin noise and convolutional neural network[C]// Proceedings of 2020 10th Annual Computing and Communication Workshop and Conference (CCWC). Piscataway:IEEE Press, 2020: 200-206. |
[24] | SHAREVSKI F , JACHIM P , FLOREK K . To tweet or not to tweet:covertly manipulating a Twitter debate on vaccines using malware-induced misperceptions[C]// Proceedings of the 15th International Conference on Availability,Reliability and Security. Piscataway:IEEE Press, 2020: 1-12. |
[25] | YAO H P , FU D Y , ZHANG P Y ,et al. MSML:a novel multilevel semi-supervised machine learning framework for intrusion detection system[J]. IEEE Internet of Things Journal, 2019,6(2): 1949-1959. |
[26] | GENG C X , HUANG S J , CHEN S C . Recent advances in open set recognition:a survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021,43(10): 3614-3631. |
[27] | ZHANG H , PATEL V M . Sparse representation-based open set recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017,39(8): 1690-1696. |
[28] | RUDD E M , JAIN L P , SCHEIRER W J ,et al. The extreme value machine[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018,40(3): 762-768. |
[29] | BENDALE A , BOULT T E . Towards open set deep networks[C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2016: 1563-1572. |
[30] | NEAL L , OLSON M , FERN X ,et al. Open set learning with counterfactual images[C]// Proceedings of the European Conference on Computer Vision (ECCV). Berlin:Springer, 2018: 613-628. |
[31] | GENG C X , CHEN S C . Collective decision for open set recognition[J]. IEEE Transactions on Knowledge and Data Engineering, 2022,34(1): 192-204. |
[32] | CRUZ S , COLEMAN C , RUDD E M ,et al. Open set intrusion recognition for fine-grained attack categorization[C]// Proceedings of 2017 IEEE International Symposium on Technologies for Homeland Security. Piscataway:IEEE Press, 2017: 1-6. |
[33] | HENRYDOSS J , CRUZ S , RUDD E M ,et al. Incremental open set intrusion recognition using extreme value machine[C]// Proceedings of 2017 16th IEEE International Conference on Machine Learning and Applications. Piscataway:IEEE Press, 2017: 1089-1093. |
[34] | SZEGEDY C , ZAREMBA W , SUTSKEVER I ,et al. Intriguing properties of neural networks[J]. arXiv Preprint,arXiv:1312.6199, 2013. |
[35] | GOODFELLOW I J , SHLENS J , SZEGEDY C . Explaining and harnessing adversarial examples[J]. arXiv Preprint,arXiv:1412.6572, 2014. |
[36] | MOPURI K R , GARG U , BABU R V . Fast feature fool:a data independent approach to universal adversarial perturbations[J]. arXiv Preprint,arXiv:1707.05572, 2017. |
[37] | LIN Z , SHI Y , XUE Z . Idsgan:generative adversarial networks for attack generation against intrusion detection[J]. arXiv Preprint,arXiv:1809.02077, 2018. |
[38] | LI J , ZHOU L , LI H X ,et al. Dynamic traffic feature camouflaging via generative adversarial networks[C]// Proceedings of 2019 IEEE Conference on Communications and Network Security. Piscataway:IEEE Press, 2019: 268-276. |
[39] | GRILL J B , STRUB F , ALTCHé F , ,et al. Bootstrap your own latent-a new approach to self-supervised learning[J]. arXiv Preprint,arXiv:2006.07733, 2020. |
[40] | SOHN K , . Improved deep metric learning with multi-class N-pair loss objective[C]// Proceedings of the 30th International Conference on Neural Information Processing Systems. Massachusetts:MIT Press, 2016: 1857-1865. |
[41] | CHEN T , KORNBLITH S , NOROUZI M ,et al. A simple frame-work for contrastive learning of visual representations[J]. arXiv Preprint,arXiv:2002.05709, 2020. |
[42] | HE K M , FAN H Q , WU Y X ,et al. Momentum contrast for unsupervised visual representation learning[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2020: 9726-9735. |
[43] | HU Q J , WANG X , HU W ,et al. AdCo:adversarial contrast for efficient learning of unsupervised representations from self-trained negative adversaries[C]// Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2021: 1074-1083. |
[44] | HO C H , NVASCONCELOS N . Contrastive learning with adversarial examples[J]. arXiv Preprint,arXiv:2020.12050, 2020. |
[45] | LI M Z , LIN X X , CHEN X Y ,et al. Keywords and instances:a hierchical contrastive learning framework unifying hybrid granularities for text generation[J]. arXiv Preprint,arXiv:2205.13346, 2022. |
[46] | BUKCHIN G , SCHWARTZ E , SAENKO K ,et al. Fine-grained angular contrastive learning with coarse labels[C]// Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2021: 8726-8736. |
[47] | HABIBI L A H , DRAPER-GIL G , MAMUN M S I ,et al. Characterization of tor traffic using time based features[C]// Proceedings of the 3rd International Conference on Information Systems Security and Privacy.[S.l. ]:SCITEPRESS - Science and Technology Publications, 2017: 253-262. |
[48] | HIGGINS I , MATTHEY L , PAL A ,et al. Beta-vae:learning basic visual concepts with a constrained variational framework[C]// ICLR 2017 Conference Homepage.[S.l.:s.n.], 2017: 1-8. |
[49] | OORD A V D , LI Y Z , VINYALS O . Representation learning with contrastive predictive coding[J]. arXiv Preprint,arXiv:1807.03748, 2018. |
[50] | HENDRYCKS D , GIMPEL K . Early methods for detecting adversarial images[J]. arXiv Preprint,arXiv:1608.00530, 2016. |
[51] | FEINMAN R , CURTIN R R , SHINTRE S ,et al. Detecting adversarial samples from artifacts[J]. arXiv Preprint,arXiv:1703.00410, 2017. |
[52] | TANAY T , GRIFFIN L D . A new angle on L2 regularization[J]. arXiv Preprint,arXiv:1806.11186, 2018. |
[53] | TRAMèR F , KURAKIN A , PAPERNOT N ,et al. Ensemble adversarial training:Attacks and defenses[J]. arXiv Preprint,arXiv:1705.07204, 2017. |
[54] | TAVALLAEE M , BAGHERI E , LU W ,et al. A detailed analysis of the KDD CUP 99 data set[C]// Proceedings of 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications. Piscataway:IEEE Press, 2009: 1-6. |
[1] | 苏新, 张桂福, 行鸿彦, Zenghui Wang. 基于平衡生成对抗网络的海洋气象传感网入侵检测研究[J]. 通信学报, 2023, 44(4): 124-136. |
[2] | 陈晋音, 熊海洋, 马浩男, 郑雅羽. 基于对比学习的图神经网络后门攻击防御方法[J]. 通信学报, 2023, 44(4): 154-166. |
[3] | 王一丰, 郭渊博, 陈庆礼, 方晨, 林韧昊, 周永良, 马佳利. 基于对比增量学习的细粒度恶意流量分类方法[J]. 通信学报, 2023, 44(3): 1-11. |
[4] | 霍纬纲, 梁锐, 李永华. 基于随机Transformer的多维时间序列异常检测模型[J]. 通信学报, 2023, 44(2): 94-103. |
[5] | 王延文, 雷为民, 张伟, 孟欢, 陈新怡, 叶文慧, 景庆阳. 基于生成模型的视频图像重建方法综述[J]. 通信学报, 2022, 43(9): 194-208. |
[6] | 段雪源, 付钰, 王坤. 基于VAE-WGAN的多维时间序列异常检测方法[J]. 通信学报, 2022, 43(3): 1-13. |
[7] | 刘奇旭, 王君楠, 尹捷, 陈艳辉, 刘嘉熹. 对抗机器学习在网络入侵检测领域的应用[J]. 通信学报, 2021, 42(11): 1-12. |
[8] | 胡永进,郭渊博,马骏,张晗,毛秀青. 基于对抗样本的网络欺骗流量生成方法[J]. 通信学报, 2020, 41(9): 59-70. |
[9] | 田有亮,吴雨龙,李秋贤. 基于信息论的入侵检测最佳响应方案[J]. 通信学报, 2020, 41(7): 121-130. |
[10] | 张兴兰,尹晟霖. 可变融合的随机注意力胶囊网络入侵检测模型[J]. 通信学报, 2020, 41(11): 160-168. |
[11] | 孙伟,张鹏,何永全,邢丽超. 内网环境下基于时空事件关联的攻击检测方法[J]. 通信学报, 2020, 41(1): 33-41. |
[12] | 张震,魏鹏,李玉峰,兰巨龙,徐萍,陈博. 改进粒子群联合禁忌搜索的特征选择算法[J]. 通信学报, 2018, 39(12): 60-68. |
[13] | 高一为,周睿康,赖英旭,范科峰,姚相振,李琳. 基于仿真建模的工业控制网络入侵检测方法研究[J]. 通信学报, 2017, 38(7): 186-198. |
[14] | 赖英旭,刘增辉,蔡晓田,杨凯翔. 工业控制系统入侵检测研究综述[J]. 通信学报, 2017, 38(2): 143-156. |
[15] | 张萍,何慧敏,张春燕,曹聪,刘燕兵,谭建龙. FilterFA:一种基于字符集规约的模式串匹配算法[J]. 通信学报, 2016, 37(12): 103-114. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
|