基于程序过程间语义优化的深度学习漏洞检测方法

doi:10.11959/j.issn.2096-109x.2023085

Abstract

Abstract:

In recent years, software vulnerabilities have been causing a multitude of security incidents, and the early discovery and patching of vulnerabilities can effectively reduce losses.Traditional rule-based vulnerability detection methods, relying upon rules defined by experts, suffer from a high false negative rate.Deep learning-based methods have the capability to automatically learn potential features of vulnerable programs.However, as software complexity increases, the precision of these methods decreases.On one hand, current methods mostly operate at the function level, thus unable to handle inter-procedural vulnerability samples.On the other hand, models such as BGRU and BLSTM exhibit performance degradation when confronted with long input sequences, and are not adept at capturing long-term dependencies in program statements.To address the aforementioned issues, the existing program slicing method has been optimized, enabling a comprehensive contextual analysis of vulnerabilities triggered across functions through the combination of intra-procedural and inter-procedural slicing.This facilitated the capture of the complete causal relationship of vulnerability triggers.Furthermore, a vulnerability detection task was conducted using a Transformer neural network architecture equipped with a multi-head attention mechanism.This architecture collectively focused on information from different representation subspaces, allowing for the extraction of deep features from nodes.Unlike recurrent neural networks, this approach resolved the issue of information decay and effectively learned the syntax and semantic information of the source program.Experimental results demonstrate that this method achieves an F1 score of 73.4% on a real software dataset.Compared to the comparative methods, it shows an improvement of 13.6% to 40.8%.Furthermore, it successfully detects several vulnerabilities in open-source software, confirming its effectiveness and applicability.

Key words: vulnerability detection, program slice, deep learning, attention mechanism

CLC Number:

TP311

Yan LI, Weizhong QIANG, Zhen LI, Deqing ZOU, Hai JIN. Deep learning vulnerability detection method based on optimized inter-procedural semantics of programs[J]. Chinese Journal of Network and Information Security, 2023, 9(6): 86-101.

Figures/Tables 19

References 38

[1]	CVE[EB].
[2]	SKYBOX SECURITY[EB].
[3]	WU T M , WEN S , XIANG Y ,et al. Twitter spam detection:survey of new approaches and comparative study[J]. Computers ＆ Security, 2018,76(7): 265-284.
[4]	LECUN Y , BENGIO Y , HINTON G . Deep learning[J]. Nature, 2015,521(7553): 436-444.
[5]	LI Z , ZOU D Q , XU S H ,et al. VulDeePecker:a deep learning-based system for vulnerability detection[C]// The Network and Distributed System Security Symposium (NDSS). 2018: 18-21.
[6]	CHAKRABORTY S , KRISHNA R , DING Y ,et al. Deep learning based vulnerability detection:are we there yet[J]. IEEE Transactions on Software Engineering (TSE), 2021,48(9): 3280-3296.
[7]	DAM H K , TRAN T , PHAM T ,et al. Automatic feature learning for predicting vulnerable software components[J]. IEEE Transactions on Software Engineering (TSE), 2018,47(1): 67-85.
[8]	LI Z , ZOU D Q , XU S H ,et al. SySeVR:a framework for using deep learning to detect software vulnerabilities[J]. IEEE Transactions on Dependable and Secure Computing (TDSC), 2021,19(4): 2244-2258.
[9]	XIAO Y , CHEN B H , YU C D ,et al. MVP:detecting vulnerabilities using patch-enhanced vulnerability signatures[C]// Proceedings of USENIX Security Symposium. 2020.
[10]	LIN G J , WEN S , HAN Q L ,et al. Software vulnerability detection using deep neural networks:a survey[J]. Proceedings of the IEEE, 2020,108(10): 1825-1848.
[11]	LIN G J , ZHANG J , LUO W ,et al. POSTER:vulnerability discovery with function representation learning from unlabeled projects[C]// Proceedings of the Conference on Computer and Communications Security (CCS). 2017.
[12]	ZHOU Y Q , LIU S Q , SIOW J ,et al. Devign:effective vulnerability identification by learning comprehensive program semantics via graph neural networks[C]// Proceedings of Annual Conference on Neural Information Processing Systems (NeurIPS). 2019.
[13]	VASWANI A , SHAZEER N , PARMAR N ,et al. Attention is all you need[C]// Proceedings of Advances in Neural Information Processing Systems (NIPS). 2017.
[14]	GHAFFARIAN S M , SHAHRIARI H R . Software vulnerability analysis and discovery using machine-learning and data-mining techniques[J]. ACM Computing Surveys (CSUR), 2017,50(4): 1-36.
[15]	ACHARYA M . Mining API patterns as partial orders from source code:from usage scenarios to specifications[C]// Proceedings of Joint Meeting of the European Software Engineering Conference ＆the ACM Sigsoft Symposium on the Foundations of Software Engineering. 2007.
[16]	SCANDARIATO R , WALDEN J , HOVSEPYAN A ,et al. Predicting vulnerable software components via text mining[J]. IEEE Transactions on Software Engineering (TSE), 2014,40(10): 993-1006.
[17]	YAMAGUCHI F , WRESSNEGGER C , GASCON H ,et al. Chucky:exposing missing checks in source code for vulnerability discovery[C]// Proceedings of the Conference on Computer ＆ Communications Security (CCS). 2013.
[18]	WHITE M , VENDOME C , LINARES-VASQUEZ M ,et al. Toward deep learning software repositories[C]// Proceedings of IEEE/ACM Working Conference on Mining Software Repositories. 2015.
[19]	WANG S , LIU T Y , TAN L . Automatically learning semantic features for defect prediction[C]// Proceedings of IEEE/ACM 38th International Conference on Software Engineering (ICSE). 2016.
[20]	YAMAGUCHI F , GOLDE N , ARP D ,et al. Modeling and discovering vulnerabilities with code property graphs[C]// IEEE Symposium on Security and Privacy (S＆P). 2014.
[21]	SHAR L K , BRIAND L , TAN H . Web application vulnerability prediction using hybrid program analysis and machine learning[J]. IEEE Transactions on Dependable and Secure Computing (TDSC), 2015,12(6): 688-707.
[22]	LIN G J , ZHANG J , LUO W ,et al. Cross-project transfer representation learning for vulnerable function discovery[J]. IEEE Transactions on Industrial Informatics, 2018,14(7): 3289-3297.
[23]	LIN G J , XIAO W , ZHANG J ,et al. Deep Learning-Based Vulnerable Function Detection:A Benchmark[C]// Proceedings of International Conference on Information and Communications Security (ICICS). 2019.
[24]	李韵, 黄辰林, 王中锋 ,等. 基于机器学习的软件漏洞挖掘方法综述[J]. 软件学报, 2020,31(7): 2040-2061.
	LI Y , HUANG C L , WANG Z F ,et al. Survey of software vulnerability mining methods based on machine learning[J]. Journal of Software. 2020,31(7): 2040-2061.
[25]	WANG H T , YE G X , TANG Z Y ,et al. Combining graph-based learning with automated data collection for code vulnerability detection[J]. IEEE Transactions on Dependable and Secure Computing (TIFS), 2020,16: 1943-1958.
[26]	LI Y , WANG S H , NGUYEN T N . Vulnerability detection with fine-grained interpretations[C]// Proceedings of ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 2021.
[27]	YING R , BOURGEOIS D , YOU J X ,et al. GNNExplainer:generating explanations for graph neural networks[C]// Proceedings of International Conference on Neural Information Processing Systems(NIPS). 2019.
[28]	SONNEKALB T . Machine-learning supported vulnerability detection in source code[C]// Proceedings of ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 2019.
[29]	RUSSELL R L , KIM L , LEI H H ,et al. Automated vulnerability detection in source code using deep representation learning[C]// 2018 IEEE 17th International Conference on Machine Learning and Applications (ICMLA). 2018.
[30]	CHENG X , WANG H Y , HUA J Y ,et al. DeepWukong:statically detecting software vulnerabilities using deep graph neural network[J]. ACM Transactions on Software Engineering and Methodology (TOSEM 2021), 2021,30(3): 1-33.
[31]	段旭, 吴敬征, 罗天悦 ,等. 基于代码属性图及注意力双向LSTM的漏洞挖掘方法[J]. 软件学报, 2020,31(11): 3404-3420.
	DUAN X , WU J Z , LUO T Y ,et al. Vulnerability mining method based on code property graph and attention BiLSTM[J]. Journal of Software, 2020,31(11): 3404-3420.
[32]	陈肇炫, 邹德清, 李珍 ,等. 基于抽象语法树的智能化漏洞检测系统[J]. 信息安全学报, 2020,5(4): 1-13.
	CHEN Z X , ZOU D Q , LI Z ,et al. Intelligent vulnerability detection system based on abstract syntax tree[J]. Journal of Cyber Security. 2020,5(4): 1-13.
[33]	胡雨涛, 王溯远, 吴月明 ,等. 基于图神经网络的切片级漏洞检测及解释方法[J]. 软件学报, 2023,34(6): 65-82.
	HU Y T , WANG S Y , WU Y M ,et al. A Slice-level vulnerability detection and interpretation method based on graph neural network[J]. Journal of Software, 2023,34(6): 65-82.
[34]	Joern[EB]
[35]	Neo4j graph platform[EB]
[36]	MIKOLOV T , CHEN K , CORRADO G ,et al. Efficient estimation of word representations in vector space[J]. Computer Science, 2013(1).
[37]	Checkmarx[EB].
[38]	Flawfinder[EB].

Metrics

Recommended 0

No Suggested Reading articles found!

CWE-ID	漏洞类型描述
CWE-119	缓冲区溢出，内存缓冲区操作不当，读取或写入缓冲区预期边界之外的内存位置
CWE-125	越界读，软件读取的数据超过预期缓冲区的末尾或开头
CWE-787	越界写，软件将数据写入预期缓冲区的末尾或开头之前
CWE-189	数值错误，与不正确的数值计算或转换相关
CWE-190	整数溢出或环绕，当逻辑假定结果值总是大于原始值时，软件执行的计算可能会产生整数溢出或环绕。当计算结果用于资源管理或执行控制时，可能会引入其他漏洞
CWE-20	输入校验不当，软件接收输入或数据，但未验证或错误地验证输入是否安全或是否被正确处理，攻击者可能利用该漏洞修改控制流、控制任意资源和执行任意代码
CWE-369	除零错误，经常出现在涉及长度、宽度和高度等物理尺寸的计算中，可能导致系统崩溃
CWE-415	Double Free，在同一内存地址上两次调用free()，可能导致修改意外的内存位置
CWE-416	Use After Free，释放内存后引用内存，可能导致程序崩溃、使用意外值或代码执行
CWE-476	空指针解引用，程序解引用预期有效但实际为NULL的指针，通常会导致崩溃或退出

source语句特征	样例
漏洞函数参数输入	vul_func(CV,…)
局部变量声明与定义	CV = func/fread/fopen/…
全局变量声明与定义	extern int CV = …

漏洞类型	CWE-ID	关键变量与sink语句特征
	CWE-119
内存操作不当	CWE-125	内存敏感的API，如malloc、memcpy、memset等，关键变量为内存大小相关的参数；数组使用，关键变量
	CWE-787	为数组下标；指针使用，关键变量为指针本身或与指针运算相关的参数
	CWE-20
	CWE-189
数值运算不当	CWE-190	整数运算，关键变量为参与整数运算的参数，可能进一步导致缓冲区溢出漏洞
	CWE-369	除法运算或者模运算，关键变量为除数
	CWE-415	double free，关键变量为指针，漏洞在第二次调用free/delete重复释放内存时被触发
指针使用不当	CWE-416	use after free，关键变量为释放内存后的空指针，漏洞在其调用free/delete释放内存之后再次使用时被触发
	CWE-476	空指针解引用，关键变量为未初始化或被赋值为NULL的指针，如未初始化的结构体、函数指针等，漏洞在其首次使用时被触发

CWE-ID	CVE数量	CWE-ID	CVE数量
CWE-119	970	CWE-190	95
CWE-125	220	CWE-369	30
CWE-787	88	CWE-415	18
CWE-20	366	CWE-416	129
CWE-189	241	CWE-476	181

参数	设置	参数	设置
编码器层数	3	注意力头数	8
损失函数	交叉熵损失	优化算法	Adam
学习率	0.000 5	随机失活率	0.5
批量大小	16	训练轮次	50

Deep learning vulnerability detection method based on optimized inter-procedural semantics of programs

RichHTML

PDF下载

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 19

References 38

Related Articles 15

Metrics

Recommended 0

	硬件配置			软件配置	操作系统
CPU型号	GPU型号	运行内存	硬盘大小	软件配置	操作系统
Intel Xeon Gold 6234 CPU @ 3.30 GHz	Quadro RTX 5000	132 GB	4 TB	Python2.7, Joern-0.3.1, Neo4j-community-2.1.5, Python3.6, PyTorch-1.5.1, VS Code	Linux version 5.4.0-77-generic

切片方法	Acc	P	R	F1
SySeVR	60.2%	63.8%	59.6%	61.6%
MVP	61.8%	64.7%	61.5%	63.1%
本文	71.4%	75.0%	71.9%	73.4%

模型	Acc	P	R	F1
BGRU	57.4%	55.0%	60.4%	57.5%
BLSTM	59.1%	56.5%	63.3%	59.7%
Transformer	71.4%	75.0%	71.9%	73.4%

检测方法	Acc	P	R	F1
Checkmarx	50.2%	50.3%	24.2%	32.6%
FlawFinder	50.4%	50.7%	26.6%	34.9%
SySeVR	60.0%	51.2%	57.7%	54.2%
Devign	58.8%	58.5%	61.1%	59.8%
本文方法	71.4%	75.0%	71.9%	73.4%

漏洞类型	漏洞文件路径	漏洞成因分析
CWE-476	src/storage/gstor/zekernel/kernel/table/*.c	置空指针可能引起空指针解引用风险，需对其进行初始化
CWE-415	src/gausskernel/optimizer/commands/*.cpp	存在double free安全隐患，在释放指针后应及时将其置为NULL
CWE-369	src/common/backend/utils/adt/*.cpp	被除数被初始化为0，但在除以该变量时未判断其是否为零

[1]	Jinwei WANG, Zhengjia CHEN, Xue XIE, Xiangyang LUO, Bin MA. Review of malware detection and classification visualization techniques [J]. Chinese Journal of Network and Information Security, 2023, 9(5): 1-20.
[2]	Bolin ZHANG, Chuntao ZHU, Qilin YIN, Jingqiao FU, Lingyi LIU, Jiarui LIU, Hongmei LIU, Wei LU. Noise-attention-based forgery face detection method [J]. Chinese Journal of Network and Information Security, 2023, 9(4): 155-165.
[3]	Xiaomeng LI, Daidou GUO, Xunfang ZHUO, Heng YAO, Chuan QIN. Carrier-independent screen-shooting resistant watermarking based on information overlay superimposition [J]. Chinese Journal of Network and Information Security, 2023, 9(3): 135-149.
[4]	Rongna XIE, Zhuhong MA, Zongyu LI, Ye TIAN. Encrypted traffic classification method based on convolutional neural network [J]. Chinese Journal of Network and Information Security, 2022, 8(6): 84-91.
[5]	Dengyong ZHANG, Huang WEN, Feng LI, Peng CAO, Lingyun XIANG, Gaobo YANG, Xiangling DING. Image inpainting forensics method based on dual branch network [J]. Chinese Journal of Network and Information Security, 2022, 8(6): 110-122.
[6]	Jiaying LIN, Wenbo ZHOU, Weiming ZHANG, Nenghai YU. Lip forgery detection via spatial-frequency domain combination [J]. Chinese Journal of Network and Information Security, 2022, 8(6): 146-155.
[7]	Chao MU, Xin WANG, Ming YANG, Heng ZHANG, Zhenya CHEN, Xiaoming WU. Hardcoded vulnerability detection approach for IoT device firmware [J]. Chinese Journal of Network and Information Security, 2022, 8(5): 98-110.
[8]	Fan GAO, Jian WANG, Jiqiang LIU. Research on link detection technology based on dynamic browser fingerprint [J]. Chinese Journal of Network and Information Security, 2022, 8(4): 144-156.
[9]	Jinyin CHEN, Changan WU, Haibin ZHENG. Novel defense based on softmax activation transformation [J]. Chinese Journal of Network and Information Security, 2022, 8(2): 48-63.
[10]	Baolin QIU, Ping YI. Adversarial examples defense method based on multi-dimensional feature maps knowledge distillation [J]. Chinese Journal of Network and Information Security, 2022, 8(2): 88-99.
[11]	Xiangdong HU, Zhengguo TIAN. Methods of security situation prediction for industrial internet fused attention mechanism and BSRU [J]. Chinese Journal of Network and Information Security, 2022, 8(1): 41-51.
[12]	Cheng HUANG, Mingxu SUN, Renyu DUAN, Susheng WU, Bin CHEN. Vulnerability identification technology research based on project version difference [J]. Chinese Journal of Network and Information Security, 2022, 8(1): 52-62.
[13]	Lijuan LI, Man LI, Hongjun BI, Huachun ZHOU. Multi-type low-rate DDoS attack detection method based on hybrid deep learning [J]. Chinese Journal of Network and Information Security, 2022, 8(1): 73-85.
[14]	Zhongyuan QIN, Zhaoxiang HE, Tao LI, Liquan CHEN. Adversarial example defense algorithm for MNIST based on image reconstruction [J]. Chinese Journal of Network and Information Security, 2022, 8(1): 86-94.
[15]	Deqing ZOU, Xiang LI, Minhuan HUANG, Xiang SONG, Hao LI, Weiming LI. Intelligent vulnerability detection system based on graph structured source code slice [J]. Chinese Journal of Network and Information Security, 2021, 7(5): 113-122.