通信学报 ›› 2018, Vol. 39 ›› Issue (10): 155-165.doi: 10.11959/j.issn.1000-436x.2018224
汪洁,杨力立,杨珉
修回日期:
2018-07-19
出版日期:
2018-10-01
发布日期:
2018-11-23
作者简介:
汪洁(1980-),女,湖南桃江人,博士,中南大学副教授,主要研究方向为网络与信息安全等。|杨力立(1992-),女,布依族,贵州安顺人,中南大学硕士生,主要研究方向为网络与信息安全等。|杨珉(1993-),男,江西南昌人,中南大学硕士生,主要研究方向为强化学习等。
基金资助:
Jie WANG,Lili YANG,Min YANG
Revised:
2018-07-19
Online:
2018-10-01
Published:
2018-11-23
Supported by:
摘要:
针对目前网络大数据环境攻击检测中因某些攻击步骤样本的缺失而导致攻击模型训练不够准确的问题,以及现有集成分类器在构建多级分类器时存在的不足,提出基于多层集成分类器的恶意网络流量检测方法。该方法首先采用无监督学习框架对数据进行预处理并将其聚成不同的簇,并对每一个簇进行噪音处理,然后构建一个多层集成分类器 MLDE 检测网络恶意流量。MLDE 集成框架在底层使用基分类器,非底层使用不同的集成元分类器。该框架构建简单,能并发处理大数据集,并能根据数据集的大小来调整集成分类器的规模。实验结果显示,当MLDE的基层使用随机森林、第2层使用bagging集成分类器、第3层使用AdaBoost集成分类器时,AUC的值能达到0.999。
中图分类号:
汪洁,杨力立,杨珉. 基于集成分类器的恶意网络流量检测[J]. 通信学报, 2018, 39(10): 155-165.
Jie WANG,Lili YANG,Min YANG. Multitier ensemble classifiers for malicious network traffic detection[J]. Journal on Communications, 2018, 39(10): 155-165.
表2
流特征属性"
特征 | 描述 |
pkts | 报文总数 |
pkt_noPayload | 无负载报文总数 |
bytes | 传送的字节总数 |
pay_bytes | 所有负载的字节总数 |
duration | 流持续时间 |
maxsz | 最大的报文尺寸 |
minsz | 最小报文尺寸 |
avfsz | 平均报文尺寸 |
stdsz | 报文大小的标准偏差 |
maxpy | 最大的负载尺寸 |
minpy | 最小的负载尺寸 |
avgpy | 平均负载尺寸 |
stdpy | 负载尺寸的标准偏差 |
synflag | SYN的数目 |
rstfalg | RST的数目 |
pushflag | PSH的数目 |
finflag | FIN的数目 |
ackflag | ACK数目 |
syn_ackflag | SYN_ACK的数目 |
[1] | MOKHTAR B , ELTOWEISSY M . Big data and semantics management system for computer networks[J]. Ad Hoc Networks, 2017,57: 32-51. |
[2] | BROEDERS D , SCHRIJVERS E , SLOOT B VD ,et al. Big data and security policies:towards a framework for regulating the phases of analytics and use of big data[J]. Computer Law & Security Review, 2017,33(3): 309-323. |
[3] | MANOGARAN G , THOTA C , KUMAR M V . MetaCloudDataStorage architecture for BIG DATA security in cloud computing[J]. Procedia Computer Science, 2016,87: 128-133. |
[4] | XIA Y , CHEN J , LU X ,et al. Big traffic data processing framework for intelligent monitoring and recording systems[J]. Neurocomputing, 2016,181: 139-146. |
[5] | ZHANG J , LI H , GAO Q ,et al. Detecting anomalies from big network traffic data using an adaptive detection approach[J]. Information Sciences, 2015,318(C): 91-110. |
[6] | SARALADEVI B , PAZHANIRAJA N , PAUL P V ,et al. Big data and hadoop-a study in security perspective[J]. Procedia computer science, 2015,50: 596-601. |
[7] | WANG H , JIANG X , KAMBOURAKIS G . Special issue on Security,Privacy and Trust in network-based big data[J]. Information Sciences, 2015,318(C): 48-50. |
[8] | SANCHEZ M I , ZEYDAN E , OLIVA A D L ,et al. Mobility management:deployment and adaptability aspects through mobile data traffic analysis[J]. Computer Communications, 2016,95: 3-14. |
[9] | 刘敬, 谷利泽, 钮心忻 ,等. 基于单分类支持向量机和主动学习的网络异常检测研究[J]. 通信学报, 2012,36(11): 136-146. |
LIU J , GU L Z , NIU X X ,et al. Research on network anomaly detection based on one-class SVM and active learning[J]. Journal on Communications, 2012,36(11): 136-146. | |
[10] | 钱叶魁, 陈鸣, 叶立新 . 基于多尺度主成分分析的全网络异常检测方法[J]. 软件学报, 2012,23(2): 361-377. |
QIAN Y K , CHEN M , YE L X . Network-wide anomaly detection method based on multiscale principal component analysis[J]. Journal of Software, 2012,23(2): 361-377. | |
[11] | 郑黎明 . 大规模通信网络流量异常检测与优化关键技术研究[D]. 长沙:国防科技大学, 2012. |
ZHENG L M . Key Technologies research on traffic anomaly detection and optimization for large-scale networks[D]. Changsha:National University of Defense Technology, 2012. | |
[12] | 李宇翀, 罗兴国, 钱叶魁 ,等. RMPCM:一种基于健壮多元概率校准模型的全网络异常检测方法[J]. 通信学报, 2015,36(11): 201-212. |
LI Y C , LUO X G , QIAN Y K ,et al. Network-wide anomaly detection method based on robust multivariate probabilistic calibration model[J]. Journal on Communications, 2015,36(11): 201-212. | |
[13] | ABAWAJY J H , KELAREV A , CHOWDHURY M . Large iterative multitier ensemble classifiers for security of big data[J]. IEEE Transactions on Emerging Topics in Computing, 2014,2(3): 352-363. |
[14] | ABAWAJY J , CHOWDHURY M , KELAREV A . Hybrid consensus pruning of ensemble classifiers for big data malware detection[J]. IEEE Transactions on Cloud Computing, 2015,PP(99): 1-1. |
[15] | ISLAM R , ABAWAJY J . A multi-tier phishing detection and filtering approach[J]. Journal of Network and Computer Applications, 2013,36(1): 324-335. |
[16] | ISLAM M R , ABAWAJY J , WARREN M . Multi-tier phishing email classification with an impact of classifier rescheduling[C]// Pervasive Systems,Algorithms,and Networks (ISPAN). IEEE, 2009: 789-793. |
[17] | ISLAM R , SINGH J , CHONKA A ,et al. Multi-classifier classification of spam email on a ubiquitous multi-core architecture[C]// Network and Parallel Computing. IEEE, 2008: 210-217. |
[18] | ISLAM MR , ZHOU W , GUO M ,et al. An innovative analyser for multi-classifier email classification based on grey list analysis[J]. Journal of network and computer applications, 2009,32(2): 357-366. |
[19] | RUTHERFORD J R , WHITE G B . Using an improved cybersecurity kill chain to develop an improved honey community[C]// International Conference on System Sciences. 2016: 2624-2632. |
[20] | MIHAI I C , PRUNA S , BARBU I D . Cyber kill chain analysis[J]. Information Security and Cybercrime, 2014,3:37. |
[21] | DALZIEL H . Securing social media in the enterprise[M]. Amsterdam: Syngress PublishingPress, 2015: 7-15. |
[22] | WINKLER I , GOMES A T . Advanced persistent security[M]. Amsterdam : Syngress PublishingPress, 2017: 179-184. |
[23] | 汪洁, 何小贤 . 基于种子——扩充的多态蠕虫特征自动提取方法[J]. 通信学报, 2014,35(9): 12-19. |
WANG J , HE X X . Automated polymorphic worm signature generation approach based on seed-extending[J]. Journal on Communications, 2014,35(9): 12-19. | |
[24] | LINCOLN LABORATORY . 2000 DARPA Intrusion Detection Scenario Specific Data Sets[EB]. Lexington:Massachusetts Institute of Technology, 2000. |
[25] | WANG Y , XIANG Y , ZHANG J ,et al. Internet traffic classification using constrained clustering[J]. IEEE Transactions on Parallel and Distributed Systems, 2014,25(11): 2932-2943. |
[26] | MOORE A , ZUEV D , CROGAN M . Discriminators for use in flow-based classification[M]. London: Queen Mary and Westfield CollegePress, 2005. |
[27] | CASAS P , MAZEL J , OWEZARSKI P . Unsupervised network intrusion detection systems:Detecting the unknown without knowledge[J]. Computer Communications, 2012,35(7): 772-783. |
[28] | WANG Y , XIANG Y , ZHANG J ,et al. Internet traffic clustering with side information[J]. Journal of Computer and System Sciences, 2014,80(5): 1021-1036. |
[29] | COMAR P M , LIU L , SAHA S ,et al. Combining supervised and unsupervised learning for zero-day malware detection[C]// INFOCOM,2013 Proceedings IEEE. IEEE, 2013: 2022-2030. |
[30] | LIM Y , KIM H , JEONG J ,et al. Internet traffic classification demystified:on the sources of the discriminative power[C]// International Conference. ACM, 2010:9. |
[31] | HAN J W , KAMBER M , PEI J . Data mining:concepts and techniques,Third Edition[M]. 3rd ed. San Francisco: Morgan Kaufmann PublishingPress, 2011: 211-321. |
[32] | QUINLAN J R . C4.5:programs for machine learning[M]. Elsevier, 2014. |
[33] | PLATT J C . Fast training of support vector machines using sequential minimal optimization[M]. Advances in kernel methods. MIT Press, 1999: 185-208. |
[34] | HüHN J HüLLERMEIER E . FURIA:an algorithm for unordered fuzzy rule induction[J]. Data Mining and Knowledge Discovery, 2009,19(3): 293-319. |
[35] | SHALEV-SHWARTZ S , SINGER Y , SREBRO N . Pegasos:Primal estimated sub-gradient solver for SVM[C]// Proceedings of the 24th international conference on Machine learning. ACM, 2007: 807-814. |
[36] | BREIMAN L . Random forests[J]. Machine learning, 2001,45(1): 5-32. |
[37] | RUMELHART D E , HINTON G E , WILLIAMS R J . Learning internal representations by error propagation[R]. California Univ San Diego La Jolla Inst for Cognitive Science, 1985. |
[38] | HALL M A , FRANK E . Combining naive bayes and decision tables[C]// FLAIRS Conference. 2008, 2118: 318-319. |
[39] | WOLPERT D H . Stacked generalization[J]. Neural networks, 1992,5(2): 241-259. |
[40] | BREIMAN L . Bagging predictors[J]. Machine learning, 1996,24(2): 123-140. |
[41] | FREUND Y , SCHAPIRE R E . Experiments with a new boosting algorithm[C]// ICML. 199696: 148-156. |
[42] | WEBB G I . Multiboosting:A technique for combining boosting and wagging[J]. Machine learning, 2000,40(2): 159-196. |
[43] | SEEWALD A K , FüRNKRANZ J , . An evaluation of grading classifiers[C]// International Symposium on Intelligent Data Analysis. Springer-Verlag, 2001: 115-124. |
[44] | MELVILLE P , MOONEY R J . Constructing diverse classifier ensembles using artificial training examples[C]// International Joint Conference on Artificial Intelligence.Morgan Kaufmann Publishers Inc. 20033 505-510. |
[45] | KAI M T , WITTEN I H . Stacking bagged and dagged models[C]// Fourteenth international conference on machine learning.Morgan Kaufmann Publisher Inc. 1997: 367-375. |
[46] | WITTEN I H , FRANK E . Data mining:practical machine learning tools and techniques[M]. Amsterdam: Elsevier/Morgan KaufmanPress, 2011. |
[1] | 谢丽霞, 李雪鸥, 杨宏宇, 张良, 成翔. 基于样本特征强化的APT攻击多阶段检测方法[J]. 通信学报, 2022, 43(12): 66-76. |
[2] | 段雪源, 付钰, 王坤, 李彬. 基于简单统计特征的LDoS攻击检测方法[J]. 通信学报, 2022, 43(11): 53-64. |
[3] | 孙伟,张鹏,何永全,邢丽超. 内网环境下基于时空事件关联的攻击检测方法[J]. 通信学报, 2020, 41(1): 33-41. |
[4] | 李传煌,吴艳,钱正哲,孙正君,王伟明. SDN下基于深度学习混合模型的DDoS攻击检测与防御[J]. 通信学报, 2018, 39(7): 176-187. |
[5] | 王淼,王利明,徐震,马多贺. 基于熵变的多租户云内DDoS检测方法研究[J]. 通信学报, 2016, 37(Z1): 204-210. |
[6] | 汤红波,郑林浩,葛国栋,袁泉. CCN中基于节点状态模型的缓存污染攻击检测算法[J]. 通信学报, 2016, 37(9): 1-9. |
[7] | 王明华,应凌云,冯登国. 基于异常控制流识别的漏洞利用攻击检测方法[J]. 通信学报, 2014, 35(9): 20-31. |
[8] | 程宏兵,容淳铭,黄晓,曾庆凯. 高效的攻击检测与数据融合算法[J]. 通信学报, 2012, 33(9): 85-94. |
[9] | 陈珊珊,杨庚,陈生寿. 基于LEACH协议的Sybil攻击入侵检测机制[J]. 通信学报, 2011, 32(8): 143-149. |
[10] | 严芬,陈轶群,黄皓,殷新春. 使用补偿非参数CUSUM方法检测DDoS攻击[J]. 通信学报, 2008, 29(6): 128-134. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
|