通信学报 ›› 2019, Vol. 40 ›› Issue (7): 87-94.doi: 10.11959/j.issn.1000-436x.2019089

• 学术论文 • 上一篇    下一篇

面向高速网络流量的恶意镜像网站识别方法

张蕾1,2,张鹏2(),孙伟3,杨兴东4,邢丽超1,2   

  1. 1 中国科学院大学网络空间安全学院,北京 100049
    2 中国科学院信息工程研究所,北京 100093
    3 北京交通大学计算机与信息技术学院,北京 100044
    4 北京航空航天大学计算机学院,北京 100191
  • 修回日期:2019-03-04 出版日期:2019-07-25 发布日期:2019-07-30
  • 作者简介:张蕾(1996- ),女,四川广元人,中国科学院大学博士生,主要研究方向为信息过滤与内容计算及网络安全。|张鹏(1984- ),男,安徽淮南人,博士,中国科学院信息工程研究所研究员,主要研究方向为分布式系统和数据挖掘及网络安全。|孙伟(1980- ),男,山西宁武人,北京交通大学博士生,主要研究方向为计算机网络、信息安全和网络测量。|杨兴东(1994- ),男,河北张家口人,北京航空航天大学硕士生,主要研究方向为网络流数据处理及网络空间安全。|邢丽超(1993- ),男,黑龙江哈尔滨人,中国科学院大学硕士生,主要研究方向为信息过滤与内容计算。
  • 基金资助:
    国家重点研究发展计划基金资助项目(2016YFB0801300);国家自然科学基金资助项目(61602474);国家自然科学基金资助项目(61602467);国家自然科学基金资助项目(61702552)

IMM4HT:an identification method of malicious mirror website for high-speed network traffic

Lei ZHANG1,2,Peng ZHANG2(),Wei SUN3,Xingdong YANG4,Lichao XING1,2   

  1. 1 School of Cyber Security,University of Chinese Academy of Sciences,Beijing 100049,China
    2 Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100093,China
    3 School of Computer and Information Technology,Beijing Jiaotong University,Beijing 100044,China
    4 School of Computer Science and Engineering,Beihang University,Beijing 100191,China
  • Revised:2019-03-04 Online:2019-07-25 Published:2019-07-30
  • Supported by:
    The National Key Research and Development Program of China(2016YFB0801300);The National Natural Science Foundation of China(61602474);The National Natural Science Foundation of China(61602467);The National Natural Science Foundation of China(61702552)

摘要:

针对网络环境中造成危害的信息通过镜像网站进行传播从而绕过检查的问题,提出了面向高速网络流量的恶意镜像网站识别方法。首先,从流量中提取碎片化数据并且还原网页源码,同时加入标准化处理来提高识别准确率;然后,将网页源码分块,利用相似度散列算法对每个网页源码分块计算散列值,得到网页源码的相似度散列值,同时引入海明距离来计算网页源码之间的相似性;最后,截取网页快照,提取其 SIFT 特征点,通过聚类分析和映射处理得到网页快照的感知散列值,通过感知散列值计算网页相似性。在真实流量下的实验表明,所提方法的准确率为93.42%,召回率为90.20%,F值为0.92,处理时延为20 μs。通过所提方法,在高速网络流量下可以有效地检测恶意镜像网页。

关键词: 恶意镜像网站, 相似度散列算法, 网页相似性

Abstract:

Aiming at the problem that some information causing harm to the network environment was transmitted through the mirror website so as to bypass the detection,an identification method of malicious mirror website for high-speed network traffic was proposed.At first,fragmented data from the traffic was extracted,and the source code of the webpage was restored.Next,a standardized processing module was utilized to improve the accuracy.Additionally,the source code of the webpage was divided into blocks,and the hash value of each block was calculated by the simhash algorithm.Therefore,the simhash value of the webpage source codes was obtained,and the similarity between the webpage source codes was calculated by the Hamming distance.The page snapshot was then taken and SIFT feature points were extracted.The perceptual hash value was obtained by clustering analysis and mapping processing.Finally,the similarity of webpages was calculated by the perceptual hash values.Experiments under real traffic show that the accuracy of the method is 93.42%,the recall rate is 90.20%,the F value is 0.92,and the processing delay is 20 μs.Through the proposed method,malicious mirror website can be effectively detected in the high-speed network traffic environment.

Key words: malicious mirror website, simhash algorithm, webpage similarity

中图分类号: 

No Suggested Reading articles found!