通信学报 ›› 2021, Vol. 42 ›› Issue (10): 162-172.doi: 10.11959/j.issn.1000-436x.2021181

• 学术论文 • 上一篇    下一篇

基于关联信息提取的恶意域名检测方法

张斌1,2, 廖仁杰1,2   

  1. 1 信息工程大学密码工程学院,河南 郑州 450001
    2 河南省信息安全重点实验室,河南 郑州 450001
  • 修回日期:2021-07-01 出版日期:2021-10-25 发布日期:2021-10-01
  • 作者简介:张斌(1969- ),男,河南南阳人,博士,信息工程大学教授、博士生导师,主要研究方向为信息系统安全
    廖仁杰(1996- ),男,四川泸州人,信息工程大学硕士生,主要研究方向为基于机器学习的恶意域名检测
  • 基金资助:
    信息保障技术重点实验室开放基金资助项目(KJ-15-109);信息工程大学新兴科研方向培育基金资助项目(2016604703);信息工程大学科研基金资助项目(2019f3303)

Malicious domain name detection method based on associated information extraction

Bin ZHANG1,2, Renjie LIAO1,2   

  1. 1 Department of Cryptogram Engineering, Information Engineering University, Zhengzhou 450001, China
    2 He’nan Province Key Laboratory of Information Security, Zhengzhou 450001, China
  • Revised:2021-07-01 Online:2021-10-25 Published:2021-10-01
  • Supported by:
    The Open Fund Project of Information Assurance Technology Key Laboratory(KJ-15-109);The New Re-search Direction Cultivation Fund of Information Engineering University(2016604703);The Research Project of Information Engineering University(2019f3303)

摘要:

为提高基于域名关联信息的恶意域名检测准确率,提出了一种基于域名解析信息与请求时间相结合的恶意域名检测方法。首先,将域名解析记录表示为异质信息网络中的节点和边,以同时表征异质域名数据获得较高的域名信息利用率;其次,为避免采用稀疏邻接矩阵相乘操作提取关联信息时间复杂度较高的问题,提出了一种基于元路径的广度优先网络遍历算法,提高关联解析信息提取效率;针对弱连接域名由于缺少关联解析信息而漏检的问题,引入请求时间刻画域名之间相关性,提高检测样本覆盖率;最后,设计权重自适应的域名表示学习算法,将域名关联解析信息和关联请求时间信息向量化,通过域名特征向量之间的欧氏距离量化域名之间关联性,进而构建有监督分类器进行恶意域名检测。理论分析和实验结果表明,所提方法具有较高的域名关联信息提取效率,所得检测覆盖率和F1分数分别为97.7%和0.951。

关键词: 恶意域名检测, 异质信息网络, 域名解析信息, 请求时间, 表示学习

Abstract:

To improve the accuracy of malicious domain name detection based on the associated information, a detection method combining resolution information and query time was proposed.Firstly, the resolution information was mapped to nodes and edges in a heterogeneous information network, which improved the utilization rate.Secondly, considering the problem of high computational complexity in extracting associated information with matrix multiplication, an efficiency breadth-first network traversal algorithm based on meta-path was proposed.Then, the query time was used to detect the domain names lacking meta-path information, which improved the coverage rate.Finally, domain names were vectorized by representation learning with adaptive weight.The Euclidean distance between domain name feature vectors was used to quantify the correlation between domain names.Based on the vectors learned above, a supervised classifier was constructed to detect malicious domain names.Theoretical analysis and experimental results show that the proposed method preforms well in extraction domain name associated information.The coverage rate and F1 score are 97.7% and 0.951 respectively.

Key words: malicious domain name detection, heterogeneous information network, domain name resolution information, query time, representation learning

中图分类号: 

No Suggested Reading articles found!