网络与信息安全学报 ›› 2017, Vol. 3 ›› Issue (8): 44-60.doi: 10.11959/j.issn.2096-109x.2017.00186
修回日期:
2017-07-22
出版日期:
2017-08-01
发布日期:
2017-12-26
作者简介:
王正琦(1992-),男,江苏镇江人,中国科学技术大学硕士生,主要研究方向为网络安全。|冯晓兵(1992-),女,山东聊城人,中国科学技术大学硕士生,主要研究方向为网络安全。|张驰(1977-),男,中国科学技术大学副教授、博士生导师,主要研究方向为计算机网络、信息安全。
基金资助:
Zheng-qi WANG1,2(),Xiao-bing FENG1,2,Chi ZHANG1,2
Revised:
2017-07-22
Online:
2017-08-01
Published:
2017-12-26
Supported by:
摘要:
针对当前传统静态恶意网页检测方案在面对海量的新增网页时面临的压力,引入了两段式的分析检测过程,并依次为每段检测提出相应的特征提取方案,通过层次化使用优化的朴素贝叶斯算法和支持向量机算法,设计并实现了一种兼顾效率和功能的恶意网页检测系统——TSMWD(two-step malicious Web page detection system)。第一层检测系统用于过滤大量的正常网页,其特点为效率高、速度快、更新迭代容易,真正率优先。第二层检测系统追求性能,对于检测的准确率要求较高,时间和资源的开销上适当放宽。实验结果表明,该架构能够在整体检测准确率基本不变的情况下,提高系统的检测速度,在时间一定的情况下,接纳更多的检测请求。
中图分类号:
王正琦,冯晓兵,张驰. 基于两层分类器的恶意网页快速检测系统研究[J]. 网络与信息安全学报, 2017, 3(8): 44-60.
Zheng-qi WANG,Xiao-bing FENG,Chi ZHANG. Study of high-speed malicious Web page detection system based on two-step classifier[J]. Chinese Journal of Network and Information Security, 2017, 3(8): 44-60.
λ | ACC | TPR | FPR |
1 | 78.30% | 76.10% | 14.20% |
1.2 | 71.30% | 85.90% | 21.60% |
1.4 | 62.90% | 92.70% | 31.50% |
1.6 | 53.50% | 96.30% | 42.20% |
表1 不同λ取值下TSMWD-I检测能力对比
λ | ACC | TPR | FPR |
1 | 78.30% | 76.10% | 14.20% |
1.2 | 71.30% | 85.90% | 21.60% |
1.4 | 62.90% | 92.70% | 31.50% |
1.6 | 53.50% | 96.30% | 42.20% |
算法 | ACC | Precision | Recall |
KNN | 76.31% | 79.22% | 78.69% |
C4.5 | 82.15% | 85.74% | 85.30% |
CART | 86.40% | 90.48% | 90.44% |
SVM | 93.57% | 92.11% | 91.80% |
表2 TSMWD-II不同分类算法分类结果均值比较
算法 | ACC | Precision | Recall |
KNN | 76.31% | 79.22% | 78.69% |
C4.5 | 82.15% | 85.74% | 85.30% |
CART | 86.40% | 90.48% | 90.44% |
SVM | 93.57% | 92.11% | 91.80% |
λ | 检测时间/s | ACC | TPR | FPR |
1 | 1.49 | 79.28% | 81.10% | 8.81% |
1.2 | 1.64 | 86.60% | 86.71% | 7.95% |
1.4 | 1.81 | 91.14% | 91.76% | 7.69% |
1.6 | 2.24 | 93.14% | 93.87% | 7.62% |
无TSMWD-I | 3.47 | 93.57% | 94.37% | 7.61% |
表3 不同λ取值下的TSMWD综合性能
λ | 检测时间/s | ACC | TPR | FPR |
1 | 1.49 | 79.28% | 81.10% | 8.81% |
1.2 | 1.64 | 86.60% | 86.71% | 7.95% |
1.4 | 1.81 | 91.14% | 91.76% | 7.69% |
1.6 | 2.24 | 93.14% | 93.87% | 7.62% |
无TSMWD-I | 3.47 | 93.57% | 94.37% | 7.61% |
检测方案 | 检测效率(毫秒/个) | ACC | TPR | FPR |
PhishNet | 1 | 90%以上 | — | — |
WarningBird | 1.5 | 91.53% | 88.84% | 1.23% |
TSMWD(λ=1.4) | 0.58 | 91.14% | 91.76% | 7.69% |
表4 TSMWD与现有检测方案检测效率对比
检测方案 | 检测效率(毫秒/个) | ACC | TPR | FPR |
PhishNet | 1 | 90%以上 | — | — |
WarningBird | 1.5 | 91.53% | 88.84% | 1.23% |
TSMWD(λ=1.4) | 0.58 | 91.14% | 91.76% | 7.69% |
检测方案 | ACC | TPR | FPR |
Justin | 89.82% | 91.0% | 7.60% |
Peter Likarish | 92.0% | 91.0% | 7.60% |
Gang Liu | 91.44% | 91.0% | 7.60% |
TSMWD(λ=1.4) | 91.14% | 91.0% | 7.60% |
表5 TSMWD与现有检测方案检测能力对比
检测方案 | ACC | TPR | FPR |
Justin | 89.82% | 91.0% | 7.60% |
Peter Likarish | 92.0% | 91.0% | 7.60% |
Gang Liu | 91.44% | 91.0% | 7.60% |
TSMWD(λ=1.4) | 91.14% | 91.0% | 7.60% |
[1] | 中国互联网信息中心. 第37次中国互联网络发展状况统计报告[R]. 北京:CNNIC, 2016. |
CNNIC. The 37th report of China Inter development statistics[R]. Beijing:CNNIC, 2016. | |
[2] | [EB/OL]. . |
[3] | PROVOS N , MAVROMMATIS P , RAJAB M A ,et al. All your iFRAMEs point to us[C]// Conference on Security Symposium. 2008: 1-15. |
[4] | SHENG S , WARDMAN B , WARNER G ,et al. An empirical analysis of phishing blacklists[C]// The Sixth Conference on Email and Anti-Spam (CEAS). 2009. |
[5] | ESHETE B , VILLAFIORITA A , WELDEMARIAM K . Malicious website detection:effectiveness and efficiency issues[C]// SysSec Workshop. 2011: 123-126. |
[6] | Making the Web safer[R/OL]. . |
[7] | Malware domain list[EB/OL]. . |
[8] | OpenDNS,PhishTank[EB/OL]. . |
[9] | PRAKASH P , KUMAR M , KOMPELLA R R ,et al. Phishnet:predictive blacklisting to detect phishing attacks[C]// INFOCOM. 2010: 1-5. |
[10] | CHRISTODORESCU M , JHA S . Testing malware detectors[J]. ACM Sigsoft Software Engineering Notes, 2004,29(4): 34-44. |
[11] | CHOU , NEIL , ROBERT LEDESMA , YUKA TERAGUCHI ,et al. Client-side defense against Web-based identity theft[C]// The 11th Annual Network & Distributed System Security Symposium (NDSS). 2004: 1-16. |
[12] | HOU Y T , CHANG Y , CHEN T ,et al. Malicious Web content detection by machine learning[J]. Expert Systems with Applications, 2010,37(1): 55-60. |
[13] | ROESCH M . Snort-lightweight intrusion detection for networks[J]. Lisa, 1999: 229-238. |
[14] | LIN S F , HOU Y T , CHEN C M ,et al. Malicious webpage detection by semantics-aware reasoning[C]// The Eighth International Conference on Intelligent Systems Design and Applications. 2008: 115-120. |
[15] | ZHANG Y , HONG J I , CRANOR L F . Cantina:a content-based approach to detecting phishing web sites[C]// The 16th International Conference on World Wide Web. 2007: 639-648. |
[16] | HOU Y T , CHANG Y , CHEN T ,et al. Malicious Web content detection by machine learning[J]. Expert Systems with Applications, 2010,37(1): 55-60. |
[17] | JUSTIN M , SAUL L K , SAVAGE S ,et al. Beyond blacklists:learning to detect malicious Web sites from suspicious URLs[C]// The 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2009: 1245-1254. |
[18] | YOO S , KIM S , CHOUDHARY A ,et al. Two-phase malicious web page detection scheme using misuse and anomaly detection[J]. International Journal of Reliable Information and Assurance, 2014,2(1). |
[19] | CANALI D , COVA M , VIGNA G ,et al. Prophiler:a fast filter for the large-scale detection of malicious web pages[C]// The 20th International Conference on World Wide Web. 2011: 197-206. |
[20] | The German honeyclient project[EB/OL]. . |
[21] | The Honeynet Project. Know your enemy:honeynets[EB/OL]. . |
[22] | MAYNOR D . Metasploit toolkit for penetration testing,exploit development,and vulnerability research[M]. Elsevier, 2011. |
[23] | HAUTUS M L J . The formal Laplace transform for smooth linear systems[M]// Mathematical Systems Theory. Berlin: Springer, 1976: 29-47. |
[24] | GOLUB G H , HEATH M , WAHBA G . Generalized cross-validation as a method for choosing a good ridge parameter[J]. Technometrics, 1979,21(2): 215-223. |
[25] | PRAKASH P , KUMAR M , KOMPELLA R R ,et al. Phishnet:predictive blacklisting to detect phishing attacks[C]// INFOCOM. 2010: 1-5. |
[26] | LEE S , KIM J . Warningbird:a near real-time detection system for suspicious URLs in twitter stream[J]. IEEE Transactions on Dependable and Secure Computing, 2013,10(3): 183-195. |
[27] | LIKARISH P , JUNG E , JO I . Obfuscated malicious javascript detection using classification techniques[C]// The 4th International Conference on Malicious and Unwanted Software (MALWARE). 2009: 47-54. |
[28] | MA J , SAUL L K , SAVAGE S ,et al. Beyond blacklists:learning to detect malicious Web sites from suspicious URLs[C]// The 15th ACM SIGKDD international conference on knowledge discovery and data mining. 2009: 1245-1254. |
[29] | LIU G , QIU B , WENYIN L . Automatic detection of phishing target from phishing webpage[C]// The 20th International Conference on Pattern Recognition (ICPR). 2010: 4153-4156. |
[1] | 张颖君,刘尚奇,杨牧,张海霞,黄克振. 基于日志的异常检测技术综述[J]. 网络与信息安全学报, 2020, 6(6): 1-12. |
[2] | 杨路辉,白惠文,刘光杰,戴跃伟. 基于可分离卷积的轻量级恶意域名检测模型[J]. 网络与信息安全学报, 2020, 6(6): 112-120. |
[3] | 付溪,李晖,赵兴文. 网络钓鱼识别研究综述[J]. 网络与信息安全学报, 2020, 6(5): 1-10. |
[4] | 谢博,申国伟,郭春,周燕,于淼. 基于残差空洞卷积神经网络的网络安全实体识别方法[J]. 网络与信息安全学报, 2020, 6(5): 126-138. |
[5] | 何康,祝跃飞,刘龙,芦斌,刘彬. 敌对攻击环境下基于移动目标防御的算法稳健性增强方法[J]. 网络与信息安全学报, 2020, 6(4): 67-76. |
[6] | 袁福祥,刘粉林,刘翀,刘琰,罗向阳. MLAR:面向IP定位的大规模网络别名解析[J]. 网络与信息安全学报, 2020, 6(4): 77-94. |
[7] | 骆子铭,许书彬,刘晓东. 基于机器学习的TLS恶意加密流量检测方案[J]. 网络与信息安全学报, 2020, 6(1): 77-83. |
[8] | 张孟媛,袁钟怡. 美国网络安全审查制度发展、特点及启示[J]. 网络与信息安全学报, 2019, 5(6): 1-9. |
[9] | 黄伟,刘存才,祁思博. 针对设备端口链路的LSTM网络流量预测与链路拥塞方案[J]. 网络与信息安全学报, 2019, 5(6): 50-57. |
[10] | 贾春福,李瑞琪,田美琦,程晓阳. 信息安全与法学复合型人才培养模式[J]. 网络与信息安全学报, 2019, 5(3): 31-35. |
[11] | 秦玉海,刘禄源,高浩航,刘晟桥,董涵. 创新专业技能大赛 铸就警界实践英才[J]. 网络与信息安全学报, 2019, 5(3): 75-80. |
[12] | 胡浩, 刘玉岭, 张玉臣, 张红旗. 基于攻击图的网络安全度量研究综述[J]. 网络与信息安全学报, 2018, 4(9): 1-16. |
[13] | 胡军台,武振宇,付晓,王逸超. 基于博弈的异构控制器云安全策略研究[J]. 网络与信息安全学报, 2018, 4(9): 52-59. |
[14] | 宋蕾, 马春光, 段广晗. 机器学习安全及隐私保护研究进展[J]. 网络与信息安全学报, 2018, 4(8): 1-11. |
[15] | 燕昺昊,韩国栋. 基于深度循环神经网络和改进SMOTE算法的组合式入侵检测模型[J]. 网络与信息安全学报, 2018, 4(7): 48-59. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||