基于TensorFlow的恶意代码片段自动取证检测算法

doi:10.11959/j.issn.2096-109x.2021048

网络与信息安全学报 ›› 2021, Vol. 7 ›› Issue (4): 154-163.doi: 10.11959/j.issn.2096-109x.2021048

基于TensorFlow的恶意代码片段自动取证检测算法

李炳龙, 佟金龙, 张宇, 孙怡峰, 王清贤, 常朝稳

信息工程大学密码工程学院，河南郑州 450001

修回日期:2021-02-01 出版日期:2021-08-15 发布日期:2021-08-01
作者简介:李炳龙（1974− ），男，河南卫辉人，博士，信息工程大学副教授，主要研究方向为数字调查与取证、网络入侵溯源追踪与取证、云计算取证、智能手机取证等
佟金龙（1997− ），男，河北保定人，信息工程大学助理工程师，主要研究方向为信息智能安全
张宇（1996− ），男，江苏连云港人，信息工程大学硕士生，主要研究方向为智能手机取证
孙怡峰（1976− ），男，河南新乡人，博士，信息工程大学副教授，主要研究方向为人工智能与信息安全
王清贤（1960− ），男，河南卫辉人，信息工程大学教授、博士生导师，主要研究方向为网络与信息安全
常朝稳（1966− ），男，河南滑县人，博士，信息工程大学教授、博士生导师，主要研究方向为网络信息防御
基金资助:
国家自然科学基金(60903220)

Auto forensic detecting algorithms of malicious code fragment based on TensorFlow

Binglong LI, Jinlong TONG, Yu ZHANG, Yifeng SUN, Qingxian WANG, Chaowen CHANG

College of Cryptographic Engineering, Information Engineering University, Zhengzhou 450001, China

Revised:2021-02-01 Online:2021-08-15 Published:2021-08-01
Supported by:
The National Natural Science Foundation of China(60903220)

摘要/Abstract

摘要：

针对数字犯罪事件调查，在复杂、异构及底层的海量证据数据中恶意代码片段识别难的问题，通过分析TensorFlow深度学习模型结构及其特性，提出一种基于TensorFlow的恶意代码片段检测算法框架；通过分析深度学习算法训练流程及其机制，提出一种基于反向梯度训练的算法；为解决不同设备、不同文件系统的证据源中恶意代码片段特征提取问题，提出一种基于存储介质底层的二进制特征预处理算法；为进行反向传播训练，设计并实现了一个代码片段数据集制作算法。实验结果表明，基于TensorFlow的恶意代码片段检测算法针对不同存储介质以及证据存储容器中恶意代码片段的自动取证检测，综合评价指标F₁达到 0.922，并且和 CloudStrike、Comodo、FireEye 等杀毒引擎相比，该算法在处理底层代码片段数据方面具有绝对优势。

关键词: 自动取证, 深度学习, 全连接神经网络, 恶意代码片段

Abstract:

In order to auto detect the underlying malicious code fragments in complex，heterogeneous and massive evidence data about digital forensic investigation, a framework for malicious code fragment detecting algorithm based on TensorFlow was proposed by analyzing TensorFlow model and its characteristics.Back-propagation training algorithm was designed through the training progress of deep learning.The underlying binary feature pre-processing algorithm of malicious code fragment was discussed and proposed to address the problem about different devices and heterogeneous evidence sources from storage media and such as AFF forensic containers.An algorithm which used to generate data set about code fragments was designed and implemented.The experimental results show that the comprehensive evaluation index F₁of the method can reach 0.922, and compared with CloudStrike, Comodo, FireEye antivirus engines, the algorithm has obvious advantage in dealing with the underlying code fragment data from heterogeneous storage media.

Key words: auto forensics, deep learning, full connected network, malicious code fragment

中图分类号:

TP309

李炳龙, 佟金龙, 张宇, 孙怡峰, 王清贤, 常朝稳. 基于TensorFlow的恶意代码片段自动取证检测算法[J]. 网络与信息安全学报, 2021, 7(4): 154-163.

Binglong LI, Jinlong TONG, Yu ZHANG, Yifeng SUN, Qingxian WANG, Chaowen CHANG. Auto forensic detecting algorithms of malicious code fragment based on TensorFlow[J]. Chinese Journal of Network and Information Security, 2021, 7(4): 154-163.

图/表 9

图1

图2

图3

图4

图5

图6

图7

图8

表1

参考文献 27

[1]	TEPE A N , BOYLAN A A , DAVIS D W . Analysis of digital forensic capabilities in texas law enforcement agencies[R]. The Bush School of Government ＆ Public Service, 2019.
[2]	CAVIGLIONE L , WENDZE S , MAZURCZKY W . The future of digital forensics:challenges and the road ahead[J]. IEEE Security＆ Privacy, 2017,15(6): 12-17.
[3]	LIN X D , CHEN T , ZHU T ,et al. Automated forensic analysis of mobile applications on Android devices[J]. Digital Investigation, 2018,26: S59-S66.
[4]	高元照, 李炳龙, 陈性元 . 基于MapReduce的HDFS数据窃取随机检测算法[J]. 通信学报, 2018,39(10): 11-21.
	GAO Y Z , LI B L , CHEN X Y . Stochastic algorithm for HDFS data detection based on MapReduce[J]. Journal on Communications, 2018,39(10): 11-21.
[5]	ZAWOAD S , HASAN R . Digital forensics in the age of big data:challenges,approaches,and opportunities[C]// IEEE Big Data Security 2015. 2015: 1-7.
[6]	SERVIDA F , CASEY E . IoT forensic challenges and opportunities for digital traces[J]. Digital Investigation, 2019,28: S22-S29.
[7]	韩宗达, 李炳龙 . 基于证据库的数据证据转换模型[J]. 计算机应用研究, 2015,32(7): 2140-2143.
	HAN Z D , LI B L . Evidence conversion model based on evidence database[J]. Application Research of Computers, 2015,32(7): 2140-2143.
[8]	JAMES J I , PAVEL G . Challenges with automation in digital forensic investigations[J]. arXiv:1303.4498, 2013.
[9]	Guidanc[EB].
[10]	AccessData[EB].
[11]	ZHU Y , JAMES J , GLADYSHEV P . A consistency study of the windows registry[C]// 6th IFIP WG 11.9 International Conference on Digital Forensics(DF). 2010: 77-90.
[12]	ROGERS M , GOLDMAN J ,et al. Computer forensics field triage process model[J]. Journal of Digital Forensics,Security and Law, 2006,1(2): 27-40.
[13]	MISLAN R P , CASEY E ,et al. The growing need for on-scene triage of mobile devices[J]. Digital Investigation, 2010,6(3-4): 112-124.
[14]	MARZIALE L , RICHARD G G , ROUSSEV V . Massive threading:using GPUs to increase the performance of digital forensics tools[J]. Digital Investigation, 2007,4S: S73-S81.
[15]	GARFINKEL S , NELSON A , WHITE D ,et al. Using purpose-built functions and block hashes to enable small block and sub-file forensics[J]. Digital Investigation, 2010,7: S13-S23.
[16]	QUICK D , CHOO K K R . Impacts of increasing volume of digital forensic data:a survey and future research challenges[J]. Digital Investigation, 2014,11(4): 273-294.
[17]	Federal Bureau of Investigation(FBI) . 2019 Internet Crime Report[R]. 2020.
[18]	McAfee Labs. 2019 Threats Report[EB]. 2019.
[19]	PIRSCOVEANU R , HANSEN S , CZECH A . Analysis of malware behavior:type classification using machine learning[C]// Proceedings of the 2015 International Conference on Cyber Situational Awareness,Data Analytics and Assessment (CyberSA). 2015.
[20]	MOHAISEN A , ALRAWI O , MOHAISEN M . Amal:High-fidelity,behavior-based automated malware analysis and classification[J]. Computers ＆ Security, 2015,52: 251-266.
[21]	LIN Y D , LAI Y C , LU C N ,et al. Three-phase behavior-based detection and classification of known and unknown malware[J]. Security and Communication Networks, 2015,8(11): 2004-2015.
[22]	KRIS C , MAREK R . A Joint model for word embedding and word morphology[C]// Proc of the 1st Workshop on Representation Learning for NLP,August 11th. 2016: 18-26.
[23]	陈翠平 . 基于深度信念网络的文本分类算法[J]. 计算机系统应用, 2015,24(2): 121-126.
	CHEN C P . Text Categorization based on deep belief network[J]. Computer System ＆ Application, 2015,24(2): 121-126.
[24]	黎亚雄, 张坚强, 潘登 ,等. 基于RNN-RBM语言模型的语音识别研究[J]. 计算机研究与发展, 2014,51(9): 1936-1944.
	LI Y X , ZHANG J Q , PAN D ,et al. A study of speech recognition based on RNN-RBM language model[J]. Journal of Computer Research and Development, 2014,51(9): 1936-1944.
[25]	LUKASZ K , AIDAN N.G , NOAM S ,et al. One model to learn them all[J]. arXiv:1706.05137v1, 2017.06:1-10.
[26]	Ponemon Institite . 2017 Cost of Data Breach Study[R].
[27]	RAFF E , ZAK R , MUNOZ G ,et al. Automatic yara rele generation using biclustering[C]// Proceedings of the 13th ACM Workshop on Artificial Intelligence and Security. 2020: 71-82.

方法	P	R	F₁
CrowdStrike	0.667	0.096	0.168
Comodo	0.474	0.086	0.146
FireEye	1.000	0.096	0.175
ClamAV	1.000	0.102	0.185
McAfee	1.000	0.092	0.168
Kaspersky	1.000	0.094	0.172
本文方法	0.896	0.950	0.922

基于TensorFlow的恶意代码片段自动取证检测算法

Auto forensic detecting algorithms of malicious code fragment based on TensorFlow

在线阅读

pdf下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献 27

相关文章 15

Metrics

推荐阅读 0

[1]	李晓萌, 郭玳豆, 卓训方, 姚恒, 秦川. 载体独立的抗屏摄信息膜叠加水印算法[J]. 网络与信息安全学报, 2023, 9(3): 135-149.
[2]	谢绒娜, 马铸鸿, 李宗俞, 田野. 基于卷积神经网络的加密流量分类方法[J]. 网络与信息安全学报, 2022, 8(6): 84-91.
[3]	章登勇, 文凰, 李峰, 曹鹏, 向凌云, 杨高波, 丁湘陵. 基于双分支网络的图像修复取证方法[J]. 网络与信息安全学报, 2022, 8(6): 110-122.
[4]	林佳滢, 周文柏, 张卫明, 俞能海. 空域频域相结合的唇型篡改检测方法[J]. 网络与信息安全学报, 2022, 8(6): 146-155.
[5]	陈晋音, 吴长安, 郑海斌. 基于softmax激活变换的对抗防御方法[J]. 网络与信息安全学报, 2022, 8(2): 48-63.
[6]	邱宝琳, 易平. 基于多维特征图知识蒸馏的对抗样本防御方法[J]. 网络与信息安全学报, 2022, 8(2): 88-99.
[7]	李丽娟, 李曼, 毕红军, 周华春. 基于混合深度学习的多类型低速率DDoS攻击检测方法[J]. 网络与信息安全学报, 2022, 8(1): 73-85.
[8]	秦中元, 贺兆祥, 李涛, 陈立全. 基于图像重构的MNIST对抗样本防御算法[J]. 网络与信息安全学报, 2022, 8(1): 86-94.
[9]	邹德清, 李响, 黄敏桓, 宋翔, 李浩, 李伟明. 基于图结构源代码切片的智能化漏洞检测系统[J]. 网络与信息安全学报, 2021, 7(5): 113-122.
[10]	王正龙, 张保稳. 生成对抗网络研究综述[J]. 网络与信息安全学报, 2021, 7(4): 68-85.
[11]	谭清尹, 曾颖明, 韩叶, 刘一静, 刘哲理. 神经网络后门攻击研究[J]. 网络与信息安全学报, 2021, 7(3): 46-58.
[12]	杨路辉,白惠文,刘光杰,戴跃伟. 基于可分离卷积的轻量级恶意域名检测模型[J]. 网络与信息安全学报, 2020, 6(6): 112-120.
[13]	刘西蒙,谢乐辉,王耀鹏,李旭如. 深度学习中的对抗攻击与防御[J]. 网络与信息安全学报, 2020, 6(5): 36-53.
[14]	杜思佳,于海宁,张宏莉. 基于深度学习的文本分类研究进展[J]. 网络与信息安全学报, 2020, 6(4): 1-13.
[15]	翟明芳,张兴明,赵博. 基于深度学习的加密恶意流量检测研究[J]. 网络与信息安全学报, 2020, 6(3): 66-77.