基于自编码器的未知协议分类方法

doi:10.11959/j.issn.1000-436x.2020123

通信学报 ›› 2020, Vol. 41 ›› Issue (6): 88-97.doi: 10.11959/j.issn.1000-436x.2020123

基于自编码器的未知协议分类方法

顾纯祥^1,²,吴伟森¹,石雅男¹,李光松¹

¹ 信息工程大学网络空间安全学院，河南郑州 450001
² 网络密码技术河南省重点实验室，河南郑州 450001

修回日期:2020-04-03 出版日期:2020-06-25 发布日期:2020-07-04
作者简介:顾纯祥（1976- ），男，安徽霍山人，博士，信息工程大学教授、博士生导师，网络密码技术河南省重点实验室主任，主要研究方向为密码学与网络安全|吴伟森（1996- ），男，浙江天台人，信息工程大学硕士生，主要研究方向为网络安全、机器学习|石雅男（1982- ），女，河南安阳人，信息工程大学讲师，主要研究方向为安全协议分析|李光松（1977- ），男，山东德州人，博士，信息工程大学副教授，主要研究方向为网络协议分析、区块链、无线网络安全
基金资助:
国家自然科学基金资助项目(61772548);国家自然科学基金创新研究群体资助项目(61521003);信息保障技术重点实验室开放基金资助项目(KJ-17-001)

Method of unknown protocol classification based on autoencoder

Chunxiang GU^1,²,Weisen WU¹,Ya’nan SHI¹,Guangsong LI¹

¹ School of Cyberspace Security,Information Engineering University,Zhengzhou 450001,China
² Henan Key Laboratory of Network Cryptography Technology,Zhengzhou 450001,China

Revised:2020-04-03 Online:2020-06-25 Published:2020-07-04
Supported by:
The National Natural Science Foundation China(61772548);Innovative Research Groups of the National Natural Science Foundation of China(61521003);Foundation of Science and Technology on Information Assurance Laboratory(KJ-17-001)

摘要/Abstract

摘要：

针对互联网中存在的大量未知协议导致网络管理和维护网络安全十分困难的问题，提出了一种未知协议的分类识别方法。结合自编码器技术和改进的K-means聚类技术针对网络流量实现了未知协议的分类识别。利用自编码器对网络流量进行降维和特征提取，使用聚类技术对降维后数据进行无监督的分类，最终实现对网络流量的无监督识别分类。实验结果表明，所提方法分类效果优于传统的 K-means、DBSCAN、GMM 算法，且具有更高的效率。

关键词: 未知协议分类, 自编码器, 无监督分类, 特征提取

Abstract:

Aiming at the problem that a large number of unknown protocols exist in the Internet,which makes it very difficult to manage and maintain the network security,a classification and identification method of unknown protocols was proposed.Combined with the autoencoder technology and the improved K-means clustering technology,the unknown protocol was classified and identified for the network traffic.The autoencoder was used to reduce dimensionality and select features of network traffic,clustering technology was used to classify the dimensionality reduction data unsupervised,and finally unsupervised recognition and classification of network traffic were realized.Experimental results show that the classification effect is better than the traditional K-means,DBSCAN,GMM algorithm,and has higher efficiency.

Key words: unknown protocol classification, autoencoder, unsupervised classification, feature extraction

中图分类号:

TP181

顾纯祥,吴伟森,石雅男,李光松. 基于自编码器的未知协议分类方法[J]. 通信学报, 2020, 41(6): 88-97.

Chunxiang GU,Weisen WU,Ya’nan SHI,Guangsong LI. Method of unknown protocol classification based on autoencoder[J]. Journal on Communications, 2020, 41(6): 88-97.

图/表 14

图1

图3

图2

图4

图5

图6

图7

图8

图9

图10

图11

图12

图13

表1

参考文献 31

[1]	吴礼发, 洪征, 潘瑶 . 网络协议逆向分析及应用[M]. 北京: 国防工业出版社, 2016.
	WU L F , HONG Z , PAN Y . Network protocol reverse analysis and application[M]. Beijing: National Defense Industry PressPress, 2016.
[2]	ANDERSON B , MCGREW D . Machine learning for encrypted malware traffic classification:accounting for noisy labels and non-stationarity[C]// Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM Press, 2017: 1723-1732.
[3]	HINTON G , SALAKHUTDINOV R . Reducing the dimensionality of data with neural networks[J]. Science, 2006,313(5786): 504-507.
[4]	QI Y , XU L , YANG B ,et al. Packet classification algorithms:from theory to practice[J]. Proceedings - IEEE INFOCOM, 2009,13(10): 648-656.
[5]	FIVOS C , PANAYIOTIS M . Identifying known and unknown peer-to-peer traffic[C]// Proceedings of IEEE International Symposium on Network Computing ＆ Applications. Piscataway:IEEE Press, 2006: 93-102.
[6]	THAY C , VISOOTTIVISETH V , MONGKOLLUKSAMEE S . P2P traffic classification for residential network[C]// Computer Science ＆Engineering Conference. Piscataway:IEEE Press, 2016: 1-6.
[7]	CHUNG J , PARK B , WON Y ,et al. Traffic classification based on flow similarity[C]// IEEE International Workshop on IP Operations ＆Management. Berlin:Springer, 2009: 65-77.
[8]	ROCHA E , SALVADOR P , NOGUEIRA A . Detection of illicit network activities based on multivariate Gaussian fitting of multi-scale traffic characteristics[C]// 2011 IEEE International Conference on Communications. Piscataway:IEEE Press, 2011: 1-6.
[9]	TAYLOR V , SPOLAOR R , CONTI M ,et al. Robust smartphone App identification via encrypted network traffic analysis[J]. IEEE Transactions on Information Forensics ＆ Security, 2017,13(1): 63-78.
[10]	BLAKE A , SUBHARTHI P , DAVID M . Deciphering malware’s use of TLS (without decryption)[J]. arXiv Preprint,arXiv:1607.01639, 2017
[11]	WANG W , ZHU M , ZENG X , ,et al. Malware traffic classification using convolutional neural network for representation learning[C]// 2017 International Conference on Information Networking. Piscataway:IEEE Press, 2017: 712-717.
[12]	YANG Y , KANG C , GOU G ,et al. TLS/SSL encrypted traffic classification with autoencoder and convolutional neural network[C]// 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). Piscataway:IEEE Press, 2018: 362-369.
[13]	MA R , QIN S . Identification of unknown protocol traffic based on deep learning[C]// 2017 3rd IEEE International Conference on Computer and Communications. Piscataway:IEEE Press, 2017: 1195-1198.
[14]	ZHANG J , CHEN C , XIANG Y ,et al. An effective network traffic classification method with unknown flow detection[J]. IEEE Transactions on Network and Service Management, 2013,10(2): 133-147.
[15]	ZHU P , ZHANG S , LUO H ,et al. A semi-supervised method for classifying unknown protocols[C]// 2019 IEEE 3rd Information Technology,Networking,Electronic and Automation Control Conference. Piscataway:IEEE Press, 2019: 1246-1250.
[16]	ZANDER S , NGUYEN T , ARMITAGE G . Automated traffic classification and application identification using machine learning[C]// IEEE Conference on Local Computer Networks. Piscataway:IEEE Press, 2005： 250-257.
[17]	ERMAN J , ARLITT M , MAHANTI A . Traffic classification clustering algorithms[C]// Proceedings of SIGMETRICS. New York:ACM Press, 2006: 281-286.
[18]	卢政宇, 李光松, 申莹珠 ,等. 基于连续特征的未知协议消息聚类算法[J]. 山东大学学报(理学版), 2019,54(5): 37-43.
	LU Z Y , LI G S , SHEN Y Z ,et al. Clustering algorithm of unknown protocol messages based on continuous features[J]. Journal of Shandong University (Science Edition), 2019,54(5): 37-43.
[19]	DING C , HE X . Cluster structure of K-means clustering via principal component analysis[J]. Lecture Notes in Computer Science, 2004,46(4): 414-418.
[20]	CHEN X , KINGMA D , SALIMANS T ,et al. Variational lossy autoencoder[J]. arXiv preprint arXiv:1611.02731, 2016
[21]	DENG J , ZHANG Z , EYBEN F ,et al. Autoencoder-based unsupervised domain adaptation for speech emotion recognition[J]. IEEE Signal Processing Letters, 2014,21(9): 1068-1072.
[22]	BENGIO Y , LAMBLIN P , POPOVICI D ,et al. Greedy layer-wise training of deep networks[C]// Neural Information Processing Systems. Massachusetts:MIT Press, 2007: 153-160.
[23]	VINCENT P , LAROCHELLE H , BENGIO Y ,et al. Extracting and composing robust features with denoising autoencoders[C]// Machine Learning,Proceedings of the Twenty-Fifth International Conference. New York:ACM Press, 2008: 1096-1103.
[24]	RIFAI S , VINCENT P , MULLER X ,et al. Contractive auto-encoders:explicit invariance during feature extraction[C]// Proceedings of the 28th International Conference on Machine Learning. New York:ACM Press, 2011: 833-840.
[25]	HARTIGAN J , WONG M . Algorithm AS 136:a K-means clustering algorithm[J]. Journal of the Royal Statistical Society.Series C (Applied Statistics), 1979,28(1): 100-108.
[26]	SELIM S , ISMAIL M . K-means-type algorithms:a generalized convergence theorem and characterization of local optimality[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1984,6(1): 81-87.
[27]	LAURENS V , HINTON G . Visualizing data using T-SNE[J]. Journal of Machine Learning Research, 2008,9(2605): 2579-2605.
[28]	MAATEN L . Learning a Parametric embedding by preserving local structure[J]. Journal of Machine Learning Research, 2009(5): 384-391.
[29]	HALKIDI M , VAZIRGIANNIS M . Clustering validity assessment:finding the optimal partitioning of a data set[C]// IEEE International Conference on Data Mining. Piscataway:IEEE Press, 2001:187.
[30]	LIU Y , LI Z , XIONG H ,et al. Understanding of internal clustering validation measures[C]// 2010 IEEE International Conference on Data Mining. Piscataway:IEEE Press, 2010: 911-916.
[31]	HUBERT L , ARABIE P . Comparing partitions[J]. Journal of Classification, 1985,2(1): 193-218.

数据集	DEC模型/s	K-means/s	DBSCAN/s	GMM/s	PRISMA/s
数据集1	5.662 36	7.325 22	180.059 59	16.448 28	5.876 94
数据集2	4.182 25	4.601 08	27.051 56	8.413 99	4.536 88
数据集3	0.682 61	2.132 34	3.968 55	0.934 93	2.125 45

基于自编码器的未知协议分类方法

Method of unknown protocol classification based on autoencoder

在线阅读

PDF下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 14

参考文献 31

相关文章 15

Metrics

推荐阅读 0

[1]	王一丰, 郭渊博, 陈庆礼, 方晨, 林韧昊, 周永良, 马佳利. 基于对比增量学习的细粒度恶意流量分类方法[J]. 通信学报, 2023, 44(3): 1-11.
[2]	江沸菠, 彭于波, 董莉. 面向6G的深度图像语义通信模型[J]. 通信学报, 2023, 44(3): 198-208.
[3]	霍纬纲, 梁锐, 李永华. 基于随机Transformer的多维时间序列异常检测模型[J]. 通信学报, 2023, 44(2): 94-103.
[4]	王延文, 雷为民, 张伟, 孟欢, 陈新怡, 叶文慧, 景庆阳. 基于生成模型的视频图像重建方法综述[J]. 通信学报, 2022, 43(9): 194-208.
[5]	兰巨龙, 朱棣, 李丹. 面向多模态网络业务切片的虚拟网络功能资源容量智能预测方法[J]. 通信学报, 2022, 43(6): 143-155.
[6]	王晓丹, 李京泰, 宋亚飞. DDAC：面向卷积神经网络图像隐写分析模型的特征提取方法[J]. 通信学报, 2022, 43(5): 68-81.
[7]	段雪源, 付钰, 王坤. 基于VAE-WGAN的多维时间序列异常检测方法[J]. 通信学报, 2022, 43(3): 1-13.
[8]	王一丰, 郭渊博, 陈庆礼, 方晨, 林韧昊. 基于对比学习的细粒度未知恶意流量分类方法[J]. 通信学报, 2022, 43(10): 12-25.
[9]	来杰, 王晓丹, 向前, 宋亚飞, 权文. 自编码器及其应用综述[J]. 通信学报, 2021, 42(9): 218-230.
[10]	肖利民,徐向荣,韦壮焜,刘圣涵,刘怡文. 基于信道冲激响应不敏感特征的分子通信非相干信号检测[J]. 通信学报, 2020, 41(9): 49-58.
[11]	屈景怡,叶萌,渠星. 基于区域残差和LSTM网络的机场延误预测模型[J]. 通信学报, 2019, 40(4): 149-159.
[12]	殷敬伟, 罗五雄, 李理, 韩笑, 郭龙祥, 王建峰. 基于降噪自编码器的水声信号增强研究[J]. 通信学报, 2019, 40(10): 119-126.
[13]	盖杉. 四元共空间特征提取算法及其在纸币识别中的应用[J]. 通信学报, 2018, 39(12): 40-46.
[14]	沈伟国,王巍. 基于顽健线性判别分析的击键特征识别方法[J]. 通信学报, 2017, 38(Z2): 26-29.
[15]	李华亮,钱志鸿,田洪亮. 基于核函数特征提取的室内定位算法研究[J]. 通信学报, 2017, 38(1): 158-167.