通信学报 ›› 2015, Vol. 36 ›› Issue (Z1): 141-148.doi: 10.11959/j.issn.1000-436x.2015293

• 学术论文 • 上一篇    下一篇

高效的基于段模式的恶意URL检测方法

林海伦1,李焱2,王伟平1,岳银亮1,林政1   

  1. 1 中国科学院 信息工程研究所,北京 100093
    2 国家计算机网络应急技术处理协调中心,北京 100029
  • 出版日期:2015-11-25 发布日期:2015-12-29
  • 基金资助:
    国家高技术研究发展计划(“863计划)基金资助项目;国家自然科学基金资助项目;国家自然科学基金资助项目;国家自然科学基金资助项目;国家自然科学基金资助项目

Efficient segment pattern based method for malicious URL detection

Hai-lun LIN1,Yan LI2,Wei-ping WANG1,Yin-liang YUE1,Zheng LIN1   

  1. 1 Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100093,China
    2 National Computer Network Emergency Response and Coordination Center,Beijing 100029,China
  • Online:2015-11-25 Published:2015-12-29
  • Supported by:
    The National High Technology Research and Development Program of China (863 Program);The National Natural Science Foundation of China;The National Natural Science Foundation of China;The National Natural Science Foundation of China;The National Natural Science Foundation of China

摘要:

提出一种高效的基于段模式的检测恶意URL的方法,该方法首先解析已标注的恶意URL中的域名、路径名和文件名3个语义段,然后通过建立以三元组为词项的倒排索引快速计算恶意URL每个语义段的模式,最后基于倒排索引查找到的段模式来判定给定的URL是否是恶意URL。不仅如此,该方法还支持基于Jaccard的随机域名识别技术来判定包含随机域名的恶意URL。实验结果表明,与当前先进的基准方法相比,该方法具有较好的性能和可扩展性。

关键词: 恶意URL, 段模式, 三元组, 倒排索引, 随机域名

Abstract:

An efficient segment based method for detecting malicious URL was proposed.Firstly it analyzed the annotated malicious URLs in terms of three semantic segments,i.e.,domain segment,path segment and file segment.Secondly it quickly calculated the common pattern of each semantic segment exploiting the tri-gram model based inverted index.Finally it decided whether a given URL was malicious based on the segment patterns returned by searching the inverted index.Moreover,this method also supported the Jaccard based random domain name identification technique for deciding malicious URLs with random domain name.Experimental results show that proposed method outperforms the state-of-the-art baseline methods,and can achieve good efficiency and scalability on malicious URL detection.

Key words: malicious URL, segment pattern, tri-gram, inverted index, random name

No Suggested Reading articles found!