网络与信息安全学报 ›› 2023, Vol. 9 ›› Issue (1): 130-139.doi: 10.11959/j.issn.2096-109x.2023005

• 学术论文 • 上一篇    下一篇

面向不平衡样本的物联网入侵检测方法

潘桐, 陈伟, 吴礼发   

  1. 南京邮电大学计算机学院,江苏 南京 210003
  • 修回日期:2022-11-12 出版日期:2023-02-25 发布日期:2023-02-01
  • 作者简介:潘桐(1996- ),男,江苏宜兴人,南京邮电大学硕士生,主要研究方向为网络攻击流量分析与检测、机器学习
    陈伟(1979- ),男,江苏淮安人,南京邮电大学教授,主要研究方向为无线网络安全、移动互联网安全
    吴礼发(1968- ),男,湖北蕲春人,南京邮电大学教授、博士生导师,主要研究方向为网络协议逆向、软件安全漏洞挖掘及逆向和入侵检测
  • 基金资助:
    国家重点研发计划(2019YFB2101704)

IoT intrusion detection method for unbalanced samples

ANTONG P, Wen CHEN, Lifa WU   

  1. School of Computer Science, School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
  • Revised:2022-11-12 Online:2023-02-25 Published:2023-02-01
  • Supported by:
    TheNational Key R&D Program of China(2019YFB2101704)

摘要:

随着设备的迭代,网络流量呈现指数级别的增长,针对各种应用的攻击行为越来越多,从流量层面识别并对这些攻击流量进行分类具有重要意义。同时,随着物联网设备的激增,针对这些设备的攻击行为也逐渐增多,造成的危害也越来越大。物联网入侵检测方法可以从这些海量的流量中识别出攻击流量,从流量层面保护物联网设备,阻断攻击行为。针对现阶段各类攻击流量检测准确率低以及样本不平衡问题,提出了基于重采样随机森林(RF,random forest)的入侵检测模型——Resample-RF,共包含3种具体算法:最优样本选择算法、基于信息熵的特征归并算法、多分类贪心转化算法。在物联网环境中,针对不平衡样本问题,提出最优样本选择算法,增加小样本所占权重,从而提高模型准确率;针对随机森林特征分裂效率不高的问题,提出基于信息熵的特征归并算法,提高模型运行效率;针对随机森林多分类精度不高的问题,提出多分类贪心转化算法,进一步提高准确率。在两个公开数据集上进行模型的检验,在 IoT-23 数据集上 F1 达到0.99,在Kaggle数据集上F1达到1.0,均具有显著效果。从实验结果中可知,提出的模型具有非常好的效果,能从海量流量中有效识别出攻击流量,较好地防范黑客对应用的攻击,保护物联网设备,从而保护用户。

关键词: 流量分析, 物联网, 入侵检测, 随机森林, 不平衡样本

Abstract:

In recent years, network traffic increases exponentially with the iteration of devices, while more and more attacks are launched against various applications.It is significant to identify and classify attacks at the traffic level.At the same time, with the explosion of Internet of Things (IoT) devices in recent years, attacks on IoT devices are also increasing, causing more and more damages.IoT intrusion detection is able to distinguish attack traffic from such a large volume of traffic, secure IoT devices at the traffic level, and stop the attack activity.In view of low detection accuracy of various attacks and sample imbalance at present, a random forest based intrusion detection method (Resample-RF) was proposed, which consisted of three specific methods: optimal sample selection algorithm, feature merging algorithm based on information entropy, and multi-classification greedy transformation algorithm.Aiming at the problem of unbalanced samples in the IoT environment, an optimal sample selection algorithm was proposed to increase the weight of small samples.Aiming at the low efficiency problem of random forest feature splitting, a feature merging method based on information entropy was proposed to improve the running efficiency.Aiming at the low accuracy problem of random forest multi-classification, a multi-classification greedy transformation method was proposed to further improve the accuracy.The method was evaluated on two public datasets.F1 reaches 0.99 on IoT-23 dataset and 1.0 on Kaggle dataset, both of which have good performance.The experimental results show that the proposed model can effectively identify the attack traffic from the massive traffic, better prevent the attack of hackers on the application, protect the IoT devices, and thus protect the related users.

Key words: traffic analysis, IoT, intrusion detection, random forest, unbalanced sample

中图分类号: 

No Suggested Reading articles found!