通信学报 ›› 2014, Vol. 35 ›› Issue (9): 32-39.doi: 10.3969/j.issn.1000-436x.2014.09.004

• 论文Ⅰ 网络攻击与防范 • 上一篇    下一篇

轻量级的自学习网页分类方法

沙泓州1,2,3,周舟2,3,刘庆云2,3,秦鹏2,3   

  1. 1 北京邮电大学 计算机学院,北京 100876
    2 中国科学院 信息工程研究所,北京 100093
    3 信息内容安全技术国家工程实验室,北京 100093
  • 出版日期:2014-09-25 发布日期:2017-06-14
  • 基金资助:
    国家高技术研究发展计划(“863”计划)基金资助项目;国家自然科学基金资助项目

Light-weight self-learning approach for URL classification

Hong-zhou SHA1,2,3,Zhou ZHOU2,3,Qing-yun LIU2,3,Peng QIN2,3   

  1. 1 Department of Computer Science,Beijing University of Posts and Telecommunications,Beijing 100876,China
    2 Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100093,China
    3 National Engineering Laboratory for Information Security Technology,Beijing 100093,China
  • Online:2014-09-25 Published:2017-06-14
  • Supported by:
    The National High Technology Research and Development Program of China(863 Program);The National Natural Science Foundation of China

摘要:

提出了一种自学习的轻量级网页分类方法 SLW。SLW 首次引入了访问关系的概念,使其具有反馈和自学习的特点。SLW从已有的恶意网页集合出发,自动发现可信度低的用户和对应的访问关系,从而进一步利用低可信度用户对其他网页的访问关系来发现未知的恶意网址集合。实验结果表明,在相同数据集上,相比于传统检测方法,SLW方法可以显著提高恶意网页检测效果,大幅降低平均检测时间。

关键词: URL分类, 黑名单, 访问关系, 恶意网页, 网页评价

Abstract:

A self-learning light-wight (SLW) is proposed.SLW is the first to introduce access relations and have the char-acteristics of feedback and self-learning.SLW approach starts from the seed set which includes known malicious pages.Then,it automatically figures out users with low credibility based on the seed set and the visit relation database.Finally,the access records of these users are used to identify other malicious pages.Experimental results indicate that SLW ap-proach can significantly improve the efficiency of malicious pages detection and reduce the average detection time com-pared with other conventional methods.

Key words: URL classification, blacklist, access relation, malicious Web page, Web page evaluation

No Suggested Reading articles found!