Journal on Communications ›› 2019, Vol. 40 ›› Issue (7): 87-94.doi: 10.11959/j.issn.1000-436x.2019089

• Papers • Previous Articles     Next Articles

IMM4HT:an identification method of malicious mirror website for high-speed network traffic

Lei ZHANG1,2,Peng ZHANG2(),Wei SUN3,Xingdong YANG4,Lichao XING1,2   

  1. 1 School of Cyber Security,University of Chinese Academy of Sciences,Beijing 100049,China
    2 Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100093,China
    3 School of Computer and Information Technology,Beijing Jiaotong University,Beijing 100044,China
    4 School of Computer Science and Engineering,Beihang University,Beijing 100191,China
  • Revised:2019-03-04 Online:2019-07-25 Published:2019-07-30
  • Supported by:
    The National Key Research and Development Program of China(2016YFB0801300);The National Natural Science Foundation of China(61602474);The National Natural Science Foundation of China(61602467);The National Natural Science Foundation of China(61702552)

Abstract:

Aiming at the problem that some information causing harm to the network environment was transmitted through the mirror website so as to bypass the detection,an identification method of malicious mirror website for high-speed network traffic was proposed.At first,fragmented data from the traffic was extracted,and the source code of the webpage was restored.Next,a standardized processing module was utilized to improve the accuracy.Additionally,the source code of the webpage was divided into blocks,and the hash value of each block was calculated by the simhash algorithm.Therefore,the simhash value of the webpage source codes was obtained,and the similarity between the webpage source codes was calculated by the Hamming distance.The page snapshot was then taken and SIFT feature points were extracted.The perceptual hash value was obtained by clustering analysis and mapping processing.Finally,the similarity of webpages was calculated by the perceptual hash values.Experiments under real traffic show that the accuracy of the method is 93.42%,the recall rate is 90.20%,the F value is 0.92,and the processing delay is 20 μs.Through the proposed method,malicious mirror website can be effectively detected in the high-speed network traffic environment.

Key words: malicious mirror website, simhash algorithm, webpage similarity

CLC Number: 

No Suggested Reading articles found!