基于改进的残差U-Net的不平衡协议识别方法

doi:10.11959/j.issn.2096-109x.2024004

摘要/Abstract

摘要：

随着互联网的不断发展，网络攻击事件不断增多，成为网络安全的巨大挑战。在所捕获网络流量中，恶意流量往往占比较少，即攻击者使用的通信协议往往为少数类协议。当协议数据的类别分布不平衡时，现有协议识别方法能够识别出多数类协议，但是难以准确识别少数类协议。针对这一问题，提出一种基于改进的残差 U-Net 的不平衡协议识别方法，利用新的激活函数和 SE-Net（squeeze-and-excitation networks）改进残差U-Net，提升残差U-Net的特征提取能力。同时采用带权重的Dice损失函数作为协议识别模型的损失函数，少数类协议的识别准确率偏低会导致损失函数的值偏高，进而促使少数类协议主导模型的优化方向。采用所提方法进行协议识别时，首先从网络流量中抽取网络流，经过预处理转化为一维矩阵，利用协议识别模型提取协议数据的特征，进而由Softmax分类器计算输出协议类型。实验结果表明，与对比模型相比，所提协议识别模型能够更为准确地识别少数类协议，同时多数类协议的识别准确率得到了提升。

关键词: 协议识别, 类别不平衡, 卷积神经网络, 激活函数, 损失函数

Abstract:

An unbalanced protocol recognition method based on the improved Residual U-Net was proposed to solve the challenge of network security posed by the increasing network attacks with the continuous development of the Internet.In the captured network traffic, a small proportion is constituted by malicious traffic, typically utilizing minority protocols.However, existing protocol recognition methods struggle to accurately identify these minority protocols when the class distribution of the protocol data is imbalanced.To address this issue, an unbalanced protocol recognition method was proposed, which utilized the improved Residual U-Net, incorporating a novel activation function and the Squeeze-and-Excitation Networks (SE-Net) to enhance the feature extraction capability.The loss function employed in the proposed model was the weighted Dice loss function.In cases where the recognition accuracies of the minority protocols were low, the loss function value would be high.Consequently, the optimization direction of the model would be dominated by the minority protocols, resulting in improved recognition accuracies for them.During the protocol recognition process, the network flow was extracted from the network traffic and preprocessed to convert it into a one-dimensional matrix.Subsequently, the protocol recognition model extracted the features of the protocol data, and the Softmax classifier predicted the protocol types.Experimental results demonstrate that the proposed protocol recognition model achieves more accurate recognition of the minority protocols compared to the comparison model, while also improving the recognition accuracies of the majority protocols.

Key words: protocol recognition, class unbalance, convolutional neural network, activation function, loss function

中图分类号:

TP398.08

吴吉胜, 洪征, 马甜甜. 基于改进的残差U-Net的不平衡协议识别方法[J]. 网络与信息安全学报, 2024, 10(1): 136-155.

Jisheng WU, Zheng HONG, Tiantian MA. Unbalanced protocol recognition method based on improved residual U-Net[J]. Chinese Journal of Network and Information Security, 2024, 10(1): 136-155.

图/表 22

图1

图2

图3

图4

图6

图5

图7

表1

表2

图8

表3

图9

图10

表4

表5

表6

图11

图12

图13

图14

图15

图16

参考文献 16

[1]	FENG W B , HONG Z , WU L F ,et al. Review of network protocol recognition techniques[J]. Journal of Computer Applications, 2019,39(12): 3604-3614.
[2]	ZHANG Z X , LIU Q J , WANG Y H . Road extraction by deep residual U-Net[J]. IEEE Geoscience and Remote Sensing Letters, 2018,15(5): 749-753.
[3]	HU J , SHEN L , SUN G . Squeeze-and-Excitation Networks[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 7132-7141.
[4]	MILLETARI F , NAVAB N , AHMADI S H . V-Net:fully convolutional neural networks for volumetric medical image segmentation[C]// 2016 fourth International Conference on 3D Vision(3DV). 2016: 565-571.
[5]	WANG P , LI S H , YE F ,et al. PacketCGAN:exploratory study of class imbalance for encrypted traffic classification using CGAN[C]// Proceedings of 2020 IEEE International Conference on Communications (ICC). 2020: 222-224.
[6]	HASIBI R , SHOKRI M , DEHGHAN M . Augmentation scheme for dealing with imbalanced network traffic classification using deep learning[J]. arXiv preprint:1901.00204, 2019.
[7]	TANG H , LIU D , YAO L S ,et al. Feature selection algorithm for class imbalanced internet traffic[J]. Journal of Electronics ＆ Information Technology, 2021,43(4): 923-930.
[8]	GUPTA N , JINDAL V , BEDI P . CSE-IDS:using cost-sensitive deep learning and ensemble algorithms to handle class imbalance in network-based intrusion detection systems[J]. Computers ＆ Security, 2022,(112): 102499.
[9]	TELIKANI A , GANDOMI A H , CHOO K K R ,et al. A cost-sensitive deep learning based approach for network traffic classification[J]. IEEE Transactions on Network and Service Management, 2022,19(1): 661-670.
[10]	ZHANG Y , CHEN X , GUO D ,et al. PCCN:parallel cross convolutional neural network for abnormal network traffic flows detection in multi-class imbalanced network traffic Flows[J]. IEEE Access, 2019(7): 119904-119916.
[11]	XIA B H , HAN D Z , YIN X M ,et al. RICNN:a ResNet＆Inception convolutional neural network for intrusion detection of abnormal traffic[J]. Computer Science and Information Systems, 2022,19(1): 309-326.
[12]	RONNEBERGER O , FISCHER P , BROX T . U-Net:convolutional networks for biomedical image segmentation[C]// 18th International Conference on Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015. 2015: 234-241.
[13]	HOWARD A , SANDLER M , CHEN B ,et al. Searching for mobileNetV3[J]// Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 1314-1324.
[14]	LIN M , CHEN Q , YAN S C . Network in network[J]. arXiv preprint:1312.4400, 2014.
[15]	SHIRAVI A , SHIRAVI H , TAVALLAEE M . Toward developing a systematic approach to generate benchmark datasets for intrusion detection[J]. Computers ＆ Security, 2012,31(3): 357-374.
[16]	MOUSTAFA N , SLAY J . UNSW-NB15:a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set)[C]// Proceedings of 2015 Military Communications and Information Systems Conference (MilCIS). 2015: 1-6.

协议	网络流数量	占比	不平衡系数
BitTorrent	2 282	1.82%	8.76
DNS	17 359	13.88%	1.15
FTP	19 289	15.42%	1.04
HTTP	12 808	10.24%	1.56
IMAP	19 515	15.6%	1.02
POP	17 466	13.97%	1.14
SMB	19 998	15.99%	1
SMTP	1 497	1.2%	13.36
SSH	14 850	11.87%	1.35
合计	125 064	100%

协议	网络流数量	占比	不平衡系数
BitTorrent	19 855	20.43%	1
DNS	10 000	10.29%	1.99
FTP	18 348	18.88%	1.08
HTTP	6 392	6.58%	3.11
IMAP	18 081	18.61%	1.1
POP	1 599	1.65%	12.42
SMB	1 455	1.5%	13.65
SMTP	11 526	11.86%	1.72
SSH	9 914	10.2%	2
合计	97 170	100%

损失函数	ISCX2012		UNSW-NB15
损失函数	协议识别准确率	分类时间/s	协议识别准确率	分类时间/s
WeightedCrossEntropy	98.32%	57	98.64%	45
Dice	98.60%	59	98.63%	48
WeightedDice	98.68%	60	98.74%	43

模型	ISCX2012		UNSW-NB15
模型	协议识别准确率	分类时间/s	协议识别准确率	分类时间/s
Model1	98.60%	58	98.67%	39
Model2	98.61%	60	98.70%	38
Model3	98.63%	68	98.70%	45
Model4	98.66%	61	98.76%	44

模型	ISCX2012		UNSW-NB15
模型	协议识别准确率	分类时间/s	协议识别准确率	分类时间/s
Model5	98.54%	47	98.69%	41
本文模型	98.65%	59	98.75%	42