网络与信息安全学报 ›› 2022, Vol. 8 ›› Issue (6): 84-91.doi: 10.11959/j.issn.2096-109x.2022077

• 学术论文 • 上一篇    下一篇

基于卷积神经网络的加密流量分类方法

谢绒娜, 马铸鸿, 李宗俞, 田野   

  1. 北京电子科技学院,北京 100070
  • 修回日期:2022-08-14 出版日期:2022-12-15 发布日期:2023-01-16
  • 作者简介:谢绒娜(1976- ),女,山西永济人,北京电子科技学院教授,主要研究方向为网络与系统安全、访问控制、密码工程
    马铸鸿(1999- ),男,河南邓州人,北京电子科技学院硕士生,主要研究方向为信息安全
    李宗俞(1999- ),女,河南上蔡县人,北京电子科技学院硕士生,主要研究方向为信息安全
    田野(1997- ),男,内蒙古赤峰人,北京电子科技学院硕士生,主要研究方向为信息安全
  • 基金资助:
    国家重点研发计划(2017YFB0801803)

Encrypted traffic classification method based on convolutional neural network

Rongna XIE, Zhuhong MA, Zongyu LI, Ye TIAN   

  1. Beijing Electronic Science and Technology Institute, Beijing 100070, China
  • Revised:2022-08-14 Online:2022-12-15 Published:2023-01-16
  • Supported by:
    The National Key R&D Program of China(2017YFB0801803)

摘要:

针对传统加密网络流量分类方法准确率较低、泛用性不强、易侵犯隐私等问题,提出了一种基于卷积神经网络的加密流量分类方法,避免依赖原始流量数据,防止过度拟合特定应用程序的字节结构。针对网络流量的数据包大小和到达时间信息,设计了一种将原始流量转换为二维图片的方法,直方图中每个单元格代表到达相应时间间隔的具有相应大小数据包的数量,不依赖数据包有效载荷,避免了侵犯隐私;针对LeNet-5卷积神经网络模型进行了优化以提高分类精度,嵌入Inception模块进行多维特征提取并进行特征融合,使用1*1卷积来控制输出的特征维度;使用平均池化层和卷积层替代全连接层,提高计算速度且避免过拟合;使用对象检测任务中的滑动窗口方法,将每个网络单向流划分为大小相等的块,确保单个会话中训练集中的块和测试集中的块没有重叠,扩充了数据集样本。在ISCX数据集上的分类实验结果显示,针对应用流量分类任务,准确率达到了 95%以上。对比实验结果表明,训练集和测试集类型不同时,传统分类方法出现了显著的精度下降乃至失效,而所提方法的准确率依然达到了89.2%,证明了所提方法普适于加密流量与非加密流量。进行的所有实验均基于不平衡数据集,如果对数据集进行平衡化处理,准确率可能会进一步提高。

关键词: 加密流量, 卷积神经网络, 深度学习, 特征融合, 模型优化

Abstract:

Aiming at the problems of low accuracy, weak generality, and easy privacy violation of traditional encrypted network traffic classification methods, an encrypted traffic classification method based on convolutional neural network was proposed, which avoided relying on original traffic data and prevented overfitting of specific byte structure of the application.According to the data packet size and arrival time information of network traffic, a method to convert the original traffic into a two-dimensional picture was designed.Each cell in the histogram represented the number of packets with corresponding size that arrive at the corresponding time interval, avoiding reliance on packet payloads and privacy violations.The LeNet-5 convolutional neural network model was optimized to improve the classification accuracy.The inception module was embedded for multi-dimensional feature extraction and feature fusion.And the 1*1 convolution was used to control the feature dimension of the output.Besides, the average pooling layer and the convolutional layer were used to replace the fully connected layer to increase the calculation speed and avoid overfitting.The sliding window method was used in the object detection task, and each network unidirectional flow was divided into equal-sized blocks, ensuring that the blocks in the training set and the blocks in the test set in a single session do not overlap and expanding the dataset samples.The classification experiment results on the ISCX dataset show that for the application traffic classification task, the average accuracy rate reaches more than 95%.The comparative experimental results show that the traditional classification method has a significant decrease in accuracy or even fails when the types of training set and test set are different.However, the accuracy rate of the proposed method still reaches 89.2%, which proves that the method is universally suitable for encrypted traffic and non-encrypted traffic.All experiments are based on imbalanced datasets, and the experimental results may be further improved if balanced processing is performed.

Key words: encrypted traffic, convolution neural network, deep learning, feature fusion, model optimization

中图分类号: 

No Suggested Reading articles found!