通信学报 ›› 2016, Vol. 37 ›› Issue (Z1): 116-124.doi: 10.11959/j.issn.1000-436x.2016257

• 学术论文 • 上一篇    下一篇

基于页面布局相似性的钓鱼网页发现方法

邹学强1,2,张鹏1,黄彩云,陈志鹏1,孙永1,刘庆云1   

  1. 1 中国科学院信息工程研究所,北京 100093
    2 国家计算机网络应急技术处理协调中心,北京 100029
  • 出版日期:2016-10-25 发布日期:2017-01-17
  • 基金资助:
    国家自然科学基金资助项目;国家自然科学基金资助项目;国家自然科学基金资助项目;国家高技术研究发展计划(“863”计划)基金资助项目

Phishing attacks discovery based on HTML layout similarity

Xue-qiang ZOU1,2,Peng ZHANG1,Cai-yun HUANG,Zhi-peng CHEN1,Yong SUN1,Qing-yun LIU1   

  1. 1 Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100093,China
    2 National Computer Network Emergency Response and Coordination Center,Beijing 100029,China
  • Online:2016-10-25 Published:2017-01-17
  • Supported by:
    The National Natural Science Foundation of China;The National Natural Science Foundation of China;The National Natural Science Foundation of China;The National High Technology Research and Development Program of China (863 Program)

摘要:

针对钓鱼网页与真实网页布局结构相似的特点,提出了基于页面布局相似性的钓鱼网页发现方法,该方法首先抽取出网页中带链接属性的标签作为特征,然后基于该特征提取网页标签序列分支来标识网页;接着通过网页标签序列树对齐算法将网页标签序列树的对齐转换成网页标签序列分支的对齐,使二维的树结构转换成一维的字符串结构,最后通过生物信息学 BLOSUM62编码的替换矩阵快速计算对齐分值,从而提高钓鱼网页的检测效果,仿真实验表明该方法可行,并具有较高的准确率和召回率。

关键词: 页面布局, 钓鱼网页, 标签序列树

Abstract:

Based on the similarity of the layout structure between the phishing sites and real sites,an approach to discover phishing sites was presented.First,the tag with link attribute as a feature was extracted,and then based on the feature,the page tag sequence branch to identify website was extracted,followed by the page layout similarity-HTMLTagAntiPhish,the alignment of page tag sequence tree into the alignment of page tag sequence branches was converted,this converted two-dimention tree structure into one-dimention string structure,and finally through the substitution matrix of bioinfor-matics BLOSUM62 coding,alignment score quickly to improve the phishing sites detection efficiency was computed.A series of simulation experiments show that this approach is feasible and has higher precision and recall rates.

Key words: layout similarity, phishing attack, tag sequence tree

No Suggested Reading articles found!