电信科学 ›› 2023, Vol. 39 ›› Issue (7): 80-89.doi: 10.11959/j.issn.1000-0801.2023138

• 研究与开发 • 上一篇    下一篇

一种基于随机森林和改进卷积神经网络的网络流量分类方法

云本胜, 干潇雅, 钱亚冠   

  1. 浙江科技学院理学院,浙江 杭州 310023
  • 修回日期:2023-07-02 出版日期:2023-07-20 发布日期:2023-07-01
  • 作者简介:云本胜(1980- ),男,博士,浙江科技学院理学院副教授,主要研究方向为大数据分析与挖掘和机器学习
    干潇雅(2000- ),女,浙江科技学院理学院在读,主要研究方向为大数据分析
    钱亚冠(1976- ),男,博士,浙江科技学院理学院教授,主要研究方向为深度学习、人工智能安全、大数据处理
  • 基金资助:
    国家自然科学基金资助项目(61972357);浙江省自然科学基金资助项目(LZ22F020007)

A network traffic classification method based on random forest and improved convolutional neural network

Bensheng YUN, Xiaoya GAN, Yaguan QIAN   

  1. School of Science, Zhejiang University of Science and Technology, Hangzhou 310023, China
  • Revised:2023-07-02 Online:2023-07-20 Published:2023-07-01
  • Supported by:
    The National Natural Science Foundation of China(61972357);The Natural Science Foundation of Zhejiang Provincial of China(LZ22F020007)

摘要:

为了提高网络流量分类模型的效率、降低模型复杂度,提出了一种基于随机森林和改进卷积神经网络的分类方法。首先,利用随机森林评估了网络流量各个特征的重要性,并根据重要性排序进行特征选择;其次,采用 AdamW 优化器和三角循环学习率优化了卷积神经网络分类模型;最后,将该模型搭建在 Spark集群上实现模型训练的并行化。采用循环幅度恒定的三角循环学习率,选择1 024、400、256和100个最重要的特征作为输入的实验结果表明,模型的准确率分别提高到97.68%、95.84%、95.03%和94.22%。选择256个最重要的特征,采用不同学习率的实验结果表明,循环幅度减半的三角循环学习率的效果最佳,模型的准确率提高到95.25%,模型训练时间减少近1/2。

关键词: 网络流量分类, 随机森林, 卷积神经网络, Spark

Abstract:

In order to improve the efficiency and reduce the complexity of network traffic classification model, a classification method based on random forest and improved convolutional neural network was proposed.Firstly, the random forest was used to evaluate the importance of each feature of network traffic, and the feature was selected according to the importance ranking.Secondly, AdamW optimizer and triangular cyclic learning rate were adopted to optimize the convolutional neural network classification model.Then, the model was built on Spark cluster to realize the parallelization of model training.Adopting triangular cyclic learning rate with constant cycle amplitude, the experimental results of selecting 1 024, 400, 256 and 100 most important features as input show that the model accuracy is improved to 97.68%, 95.84%, 95.03% and 94.22%, respectively.The 256 most important features were selected and the experimental results based on adopting different learning rates show that the learning rate with half the cycle amplitude works best, the accuracy of the model is improved to 95.25%, and training time of the model is reduced by nearly half.

Key words: network traffic classification, random forest, convolutional neural network, Spark

中图分类号: 

No Suggested Reading articles found!