基于关联信息提取的恶意域名检测方法

doi:10.11959/j.issn.1000-436x.2021181

通信学报 ›› 2021, Vol. 42 ›› Issue (10): 162-172.doi: 10.11959/j.issn.1000-436x.2021181

基于关联信息提取的恶意域名检测方法

张斌¹^,², 廖仁杰¹^,²

¹ 信息工程大学密码工程学院，河南郑州 450001
² 河南省信息安全重点实验室，河南郑州 450001

修回日期:2021-07-01 出版日期:2021-10-25 发布日期:2021-10-01
作者简介:张斌（1969- ），男，河南南阳人，博士，信息工程大学教授、博士生导师，主要研究方向为信息系统安全
廖仁杰（1996- ），男，四川泸州人，信息工程大学硕士生，主要研究方向为基于机器学习的恶意域名检测
基金资助:
信息保障技术重点实验室开放基金资助项目(KJ-15-109);信息工程大学新兴科研方向培育基金资助项目(2016604703);信息工程大学科研基金资助项目(2019f3303)

Malicious domain name detection method based on associated information extraction

Bin ZHANG¹^,², Renjie LIAO¹^,²

¹ Department of Cryptogram Engineering, Information Engineering University, Zhengzhou 450001, China
² He’nan Province Key Laboratory of Information Security, Zhengzhou 450001, China

Revised:2021-07-01 Online:2021-10-25 Published:2021-10-01
Supported by:
The Open Fund Project of Information Assurance Technology Key Laboratory(KJ-15-109);The New Re-search Direction Cultivation Fund of Information Engineering University(2016604703);The Research Project of Information Engineering University(2019f3303)

摘要/Abstract

摘要：

为提高基于域名关联信息的恶意域名检测准确率，提出了一种基于域名解析信息与请求时间相结合的恶意域名检测方法。首先，将域名解析记录表示为异质信息网络中的节点和边，以同时表征异质域名数据获得较高的域名信息利用率；其次，为避免采用稀疏邻接矩阵相乘操作提取关联信息时间复杂度较高的问题，提出了一种基于元路径的广度优先网络遍历算法，提高关联解析信息提取效率；针对弱连接域名由于缺少关联解析信息而漏检的问题，引入请求时间刻画域名之间相关性，提高检测样本覆盖率；最后，设计权重自适应的域名表示学习算法，将域名关联解析信息和关联请求时间信息向量化，通过域名特征向量之间的欧氏距离量化域名之间关联性，进而构建有监督分类器进行恶意域名检测。理论分析和实验结果表明，所提方法具有较高的域名关联信息提取效率，所得检测覆盖率和F1分数分别为97.7%和0.951。

关键词: 恶意域名检测, 异质信息网络, 域名解析信息, 请求时间, 表示学习

Abstract:

To improve the accuracy of malicious domain name detection based on the associated information, a detection method combining resolution information and query time was proposed.Firstly, the resolution information was mapped to nodes and edges in a heterogeneous information network, which improved the utilization rate.Secondly, considering the problem of high computational complexity in extracting associated information with matrix multiplication, an efficiency breadth-first network traversal algorithm based on meta-path was proposed.Then, the query time was used to detect the domain names lacking meta-path information, which improved the coverage rate.Finally, domain names were vectorized by representation learning with adaptive weight.The Euclidean distance between domain name feature vectors was used to quantify the correlation between domain names.Based on the vectors learned above, a supervised classifier was constructed to detect malicious domain names.Theoretical analysis and experimental results show that the proposed method preforms well in extraction domain name associated information.The coverage rate and F1 score are 97.7% and 0.951 respectively.

Key words: malicious domain name detection, heterogeneous information network, domain name resolution information, query time, representation learning

中图分类号:

TP393

张斌, 廖仁杰. 基于关联信息提取的恶意域名检测方法[J]. 通信学报, 2021, 42(10): 162-172.

Bin ZHANG, Renjie LIAO. Malicious domain name detection method based on associated information extraction[J]. Journal on Communications, 2021, 42(10): 162-172.

图/表 7

图1

图2

图3

表1

表2

图4

图5

参考文献 24

[1]	ZHAUNIAROVICH Y , KHALIL I , YU T ,et al. A survey on malicious domains detection through DNS data analysis[J]. ACM Computing Surveys, 2018,51(4): 1-36.
[2]	GAO H Y , YEGNESWARAN V , JIANG J ,et al. Reexamining DNS from a global recursive resolver perspective[J]. IEEE/ACM Transactions on Networking, 2016,24(1): 43-57.
[3]	WANG X , ZHENG K F , NIU X X ,et al. Detection of command and control in advanced persistent threat based on independent access[C]// Proceedings of 2016 IEEE International Conference on Communications (ICC). Piscataway:IEEE Press, 2016: 1-6.
[4]	彭成维, 云晓春, 张永铮 ,等. 一种基于域名请求伴随关系的恶意域名检测方法[J]. 计算机研究与发展, 2019,56(6): 1263-1274.
	PENG C W , YUN X C , ZHANG Y Z ,et al. Detecting malicious do-mains using co-occurrence relation between DNS query[J]. Journal of Computer Research and Development, 2019,56(6): 1263-1274.
[5]	YEDIDIA J S , FREEMAN W T , WEISS Y . Understanding belief propagation and its generalizations[J]. Exploring Artificial Intelligence in the New Millennium, 2003,8: 236-239.
[6]	MANADHATA P K , YADAV S , RAO P ,et al. Detecting malicious domains via graph inference[M]. Cham: Springer International Publishing, 2014.
[7]	KHALIL I , YU T , GUAN B . Discovering malicious domains through passive DNS data graph analysis[C]// Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security. New York:ACM Press, 2016: 663-674.
[8]	LEE J , LEE H . GMAD:graph-based malware activity detection by DNS traffic analysis[J]. Computer Communications, 2014,49: 33-47.
[9]	臧小东, 龚俭, 胡晓艳 . 基于 AGD 的恶意域名检测[J]. 通信学报, 2018,39(7): 15-25.
	ZANG X D , GONG J , HU X Y . Detecting malicious domain names based on AGD[J]. Journal on Communications, 2018,39(7): 15-25.
[10]	PENG C W , YUN X C , ZHANG Y Z ,et al. Discovering malicious domains through alias-canonical graph[C]// Proceedings of 2017 IEEE Trustcom/BigDataSE/ICESS. Piscataway:IEEE Press, 2017: 225-232.
[11]	ZOU F T , ZHANG S Y , RAO W X ,et al. Detecting malware based on DNS graph mining[J]. International Journal of Distributed Sensor Networks, 2015,2015: 1-12.
[12]	SUN Y Z , HAN J W . Mining heterogeneous information networks:principles and methodologies[J]. Synthesis Lectures on Data Mining and Knowledge Discovery, 2012,3(2): 1-159.
[13]	TANG J , QU M , WANG M Z ,et al. LINE:large-scale information network embedding[C]// Proceedings of the 24th International Conference on World Wide Web. New York:ACM Press, 2015: 1067-1077.
[14]	LEI K , FU Q A , NI J K ,et al. Detecting malicious domains with behavioral modeling and graph embedding[C]// Proceedings of 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS). Piscataway:IEEE Press, 2019: 601-611.
[15]	PENG C W , YUN X C , ZHANG Y Z ,et al. MalShoot:shooting malicious domains through graph embedding on passive DNS data[M]. Cham: Springer International Publishing, 2019.
[16]	SUN X Q , TONG M K , YANG J H . HinDom:a robust malicious domain detection system based on heterogeneous information network with transductive classification[C]// Proceeding of the 22nd International Symposium on Research in Attacks,Intrusions and Defenses. Berkley:USENIX Association, 2019: 399-412.
[17]	KIPF T N , WELLING M . Semi-supervised classification with graph convolutional networks[J]. arXiv Preprint,arXiv:1609.02907, 2016.
[18]	LIU Z , LI S , ZHANG Y ,et al. Ringer:systematic mining of malicious domains by dynamic graph convolutional network[C]// Proceeding of the International Conference on Computational Science. Berlin:Springer, 2020: 379-398.
[19]	SUN X Q , YANG J H , WANG Z L ,et al. HGDom:heterogeneous graph convolutional networks for malicious domain detection[C]// Proceedings of 2020 IEEE/IFIP Network Operations and Management Symposium. Piscataway:IEEE Press, 2020: 1-9.
[20]	HE W X , GOU G P , KANG C C ,et al. Malicious domain detection via domain relationship and graph models[C]// Proceedings of 2019 IEEE 38th International Performance Computing and Communications Conference (IPCCC). Piscataway:IEEE Press, 2019: 1-8.
[21]	NSFOCUS. 2019 Botnet trend report[R]. NSFOCUS Security Labs, 2020.
[22]	MIKOLOV T , SUTSKEVER I , CHEN K ,et al. Distributed representations of words and phrases and their compositionality[C]// Proceeding of the Advances in Neural Information Processing Systems. Massachusetts:MIT Press, 2013: 3111-3119.
[23]	SCHüPPEN S , TEUBERT D , HERRMANN P ,et al. FANCI:feature-based automated NXDomain classification and intelligence[C]// Proceeding of the 27th USENIX Security Symposium. Berkley:USENIX Association, 2018: 1165-1181.
[24]	VAN D M L , HINTON G . Visualizing data using t-SNE[J]. Journal of machine learning research, 2008,9(11): 2579-2605.

训练集占比	SVM		RF
训练集占比	F1分数	准确率	F1分数	准确率
10%	0.783	0.862	0.477	0.798
20%	0.71	0.97	0.65	0.907
30%	0.921	0.978	0.69	0.915
40%	0.936	0.98	0.723	0.922
50%	0.939	0.981	0.742	0.926
60%	0.945	0.981	0.761	0.931
70%	0.951	0.984	0.764	0.933
80%	0.955	0.985	0.781	0.932
90%	0.963	0.987	0.784	0.941

方法	F1分数	准确率	C-Rate
Malshoot^[15]	0.933	0.961	0.555
MDND-RI	0.954	0.982	0.819
MDND-RIQT-Equal	0.908	0.969	0.977
本文方法	0.951	0.984	0.977

基于关联信息提取的恶意域名检测方法

Malicious domain name detection method based on associated information extraction

在线阅读

PDF下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 7

参考文献 24

相关文章 4

Metrics

推荐阅读 0

[1]	向夏雨, 王佳慧, 王子睿, 段少明, 潘鹤中, 庄荣飞, 韩培义, 刘川意. 基于生成对抗网络技术的医疗仿真数据生成方法[J]. 通信学报, 2022, 43(3): 211-224.
[2]	陈卓, 朱淼, 杜军威. 基于多视角图神经网络的欺诈检测算法[J]. 通信学报, 2022, 43(11): 225-232.
[3]	陈晨, 肜娅峰, 季超群, 陈德运, 何勇军. 基于深层信息散度最大化的说话人确认方法[J]. 通信学报, 2021, 42(7): 231-237.
[4]	李骜, 王卓, 于晓洋, 陈德运, 张英涛, 孙广路. 多核低冗余表示学习的稳健多视图子空间聚类方法[J]. 通信学报, 2021, 42(11): 193-204.