mVulSniffer：一种多类型源代码漏洞检测方法

doi:10.11959/j.issn.1000-436x.2023184

Abstract

Abstract:

Given the problem that the code slice used by existing deep learning-based vulnerability sniffer methods could not comprehensively encompass the subtle characteristics between vulnerability classes, and a single deep learning sniffer model had insufficient ability to learn long context-dependent information between cross-file and cross-function code statements, a multi-type source code vulnerability sniffer method was proposed.Firstly, fine-grained two-level slices containing the types of vulnerabilities were extracted based on the control dependency and data dependency information in program dependency graph.Secondly, the two-level slices were transformed into initial feature vector.Finally, a fusion model of deep learning vulnerability sniffer suitable for two-level slices was constructed to achieve accurate vulnerability detection of multi-type source code.The experimental results on multiple synthetic datasets and two real datasets show that the proposed method outperforms the existing multi-type source code vulnerability sniffer methods.

Key words: multi-type vulnerabilities sniffer, deep learning, attention mechanism, data dependency, control dependency

CLC Number:

TP311

Xuejun ZHANG, Fenghe ZHANG, Jiyang GAI, Xiaogang DU, Wenjie ZHOU, Teli CAI, Bo ZHAO. mVulSniffer: a multi-type source code vulnerability sniffer method[J]. Journal on Communications, 2023, 44(9): 149-160.

Figures/Tables 8

方法	Acc	W_F1	DT/ms
Russell^[13]	90.59%	88.36%	$2 . 08$
μVulDeePecker^[28]	95.38%	95.95%	3.57
SySeVR-BGRU^[25]	95.94%	95.94%	2.10
SySeVR-ABGRU^[25]	96.69%	96.92%	3.62
基于BERT-base^[23]	94.35%	93.60%	5.06
基于CodeBERT^[23]	95.42%	96.33%	5.12
$m V u l S n i f f e r$	$97 . 41 %$	$97 . 42 %$	3.96

方法	Acc	W_F1	DT/ms
mVulSniffer (BGRU)	83.80%	83.63%	2.08
mVulSniffer (BGRU+Att)	85.57%	86.60%	2.13
mVulSniffer (BGRU+CNN)	95.46%	95.34%	3.88
mVulSniffer (BGRU+CNN+Att)	$97 . 41 %$	$97 . 42 %$	$3 . 96$

方法		FC			AE			AU			PU
方法	A	R	F1	A	R	F1	A	R	F1	A	R	F1
Russell^[13]	91.06%	74.52%	83.17%	90.25%	80.74%	85.27%	89.44%	83.45%	81.48%	90.54%	89.77%	88.12%
μVulDeePecker^[28]	93.60%	74.77%	83.17%	96.25%	85.32%	87.73%	92.74%	85.45%	85.90%	95.54%	94.59%	94.29%
SySeVR-BGRU^[24]	93.20%	81.91%	84.26%	95.62%	80.74%	85.27%	91.28%	85.76%	83.59%	94.40%	91.07%	92.69%
SySeVR-ABGRU^[24]	93.39%	77.10%	83.13%	96.61%	85.41%	88.76%	92.56%	79.46%	84.68%	92.95%	89.63%	90.83%
基于BERT-base^[23]	93.04%	79.94%	83.37%	95.14%	85.65%	84.36%	90.72%	85.46%	83.77%	94.90%	90.83%	92.42%
基于CodeBERT^[23]	93.33%	82.15%	83.94%	95.05%	86.72%	86.14%	91.91%	85.78%	83.70%	94.40%	91.43%	92.72%
mVulSniffer	$95 . 32 %$	$84 . 08 %$	$88 . 36 %$	$97 . 46 %$	$92 . 60 %$	$91 . 96 %$	$94 . 41 %$	$91 . 69 %$	$89 . 46 %$	$96 . 02 %$	$96 . 90 %$	$94 . 99 %$

方法		Devign			REVEAL
方法	A	R	F1	A	R	F1
Russell^[13]	45.39%	48.74%	49.40%	60.90%	10.2%	15.48%
μVulDeePecker^[28]	58.39%	24.98%	35.39%	77.91%	34.77%	45.78%
SySeVR-BGRU^[24]	58.76%	55.59%	55.31%	75.54%	26.29%	36.58%
SySeVR-ABGRU^[24]	56.09%	86.50%	64.25%	76.36%	16.86%	27.67%
基于BERT-base^[23]	56.44%	44.10%	47.99%	75.93%	30.34%	40.57%
基于CodeBERT^[23]	57.01%	49.33%	51.12%	76.33%	32.91%	42.96%
mVulSniffer	$78 . 80 %$	$57 . 53 %$	$71 . 24 %$	$85 . 76 %$	$73 . 27 %$	$80 . 96 %$

References 32

[1]	刘剑, 苏璞睿, 杨珉 ,等. 软件与网络安全研究综述[J]. 软件学报, 2018,29(1): 42-68.
	LIU J , SU P R , YANG M ,et al. Software and cyber security-a survey[J]. Journal of Software, 2018,29(1): 42-68.
[2]	吴世忠 . 信息安全漏洞分析回顾与展望[J]. 清华大学学报(自然科学版), 2009,49(S2): 2065-2072.
	WU S . Review and outlook of information security vulnerability analysis[J]. Journal of Tsinghua University (Science and Technology), 2009,49(S2): 2065-2072.
[3]	吴世忠, 郭涛, 董国伟 ,等. 软件漏洞分析技术进展[J]. 清华大学学报(自然科学版), 2012,52(10): 1309-1319.
	WU S Z , GUO T , DONG G W ,et al. Software vulnerability analyses:a road map[J]. Journal of Tsinghua University (Science and Technology), 2012,52(10): 1309-1319.
[4]	BROOKS T N . Survey of automated vulnerability detection and exploit generation techniques in cyber reasoning systems[C]// Proceed ings of the Science and Information Conference. Berlin:Springer, 2018: 1083-1102.
[5]	B?HME M , PHAM V T , ROYCHOUDHURY A . Coverage-based greybox fuzzing as Markov chain[J]. IEEE Transactions on Software Engineering, 2017,45(5): 489-506.
[6]	STEPHENS N , GROSEN J , SALLS C ,et al. Driller:augmenting fuzzing through selective symbolic execution[C]// Proceedings of the Network and Distributed System Security Symposium. Piscataway:IEEE Press, 2016,16(2016): 1-16.
[7]	邹权臣, 张涛, 吴润浦 ,等. 从自动化到智能化:软件漏洞挖掘技术进展[J]. 清华大学学报(自然科学版), 2018,58(12): 1079-1094.
	ZOU Q C , ZHANG T , WU R P ,et al. From automation to intelligence:survey of research on vulnerability discovery techniques[J]. Journal of Tsinghua University, 2018,58(12): 1079-1094.
[8]	李韵, 黄辰林, 王中锋 ,等. 基于机器学习的软件漏洞挖掘方法综述[J]. 软件学报, 2020,31(7): 2040-2061.
	LI Y , HUANG C L , WANG Z F ,et al. Survey of software vulnerability mining methods based on machine learning[J]. Journal of Software, 2020,31(7): 2040-2061.
[9]	王雅文, 姚欣洪, 宫云战 ,等. 一种基于代码静态分析的缓冲区溢出检测算法[J]. 计算机研究与发展, 2012,49(4): 839-845.
	WANG Y W , YAO X H , GONG Y Z ,et al. A buffer overflow detection algorithm based on static analysis of code[J]. Journal of Computer Research and Development, 2012,49(4): 839-845.
[10]	段旭, 吴敬征, 罗天悦 ,等. 基于代码属性图及注意力双向 LSTM的漏洞挖掘方法[J]. 软件学报, 2020,31(11): 3404-3420.
	DUAN X , WU J Z , LUO T Y ,et al. Vulnerability mining method based on code property graph and attention BiLSTM[J]. Journal of Software, 2020,31(11): 3404-3420.
[11]	YAMAGUCHI F , LINDNER F , RIECK K . Vulnerability extrapolation:assisted discovery of vulnerabilities using machine learning[C]// Proceedings of the 5th USENIX Conference on Offensive Technologies. Berkeley:USENIX Association, 2011: 118-127.
[12]	PARK J , SHIN J , CHOI B . Detection of vulnerabilities by incorrect use of variable using machine learning[J]. Electronics, 2023,12(5): 1197-1212.
[13]	RUSSELL R , KIM L , HAMILTON L ,et al. Automated vulnerability detection in source code using deep representation learning[C]// Proceedings of the 17th IEEE International Conference on Machine Learning and Applications. Piscataway:IEEE Press, 2018: 757-762.
[14]	WANG S , LIU T Y , TAN L . Automatically learning semantic features for defect prediction[C]// Proceedings of the 2016 IEEE/ACM 38th International Conference on Software Engineering. Piscataway:IEEE Press, 2016: 297-308.
[15]	LI J , HE P J , ZHU J M ,et al. Software defect prediction via convolutional neural network[C]// Proceedings of the 2017 IEEE International Conference on Software Quality,Reliability and Security. Piscataway:IEEE Press, 2017: 318-328.
[16]	DAM H K , PHAM T , NG S W ,et al. A deep tree-based model for software defect prediction[J]. arXiv Preprint,arXiv:1802.00921, 2018.
[17]	KIM J , HUBCZENKO D , MONTAGUE P . Towards attention based vulnerability discovery using source code representation[C]// Proceedings of the International Conference on Artificial Neural Networks. Berlin:Springer, 2019: 731-746.
[18]	HARER J A , KIM L Y , RUSSELL R L ,et al. Automated software vulnerability detection with machine learning[J]. arXiv Preprint,arXiv:1803.04497, 2018.
[19]	DUAN X , WU J , JI S ,et al. VulSniper:focus your attention to shoot fine-grained vulnerabilities[C]// Proceedings of the 28th International Joint Conference on Artificial Intelligence. Menlo Park:AAAI Press, 2019: 4665-4671.
[20]	ZHOU Y , LIU S , SIOW J ,et al. Devign:effective vulnerability identification by learning comprehensive program semantics via graph neural networks[J]. Advances in Neural Information Processing Systems, 2019,32: 10197-10207.
[21]	CAO S , SUN X , BO L ,et al. BGNN4VD:constructing bidirectional graph neural-network for vulnerability detection[J]. Information and Software Technology, 2021,136: 106576-106587.
[22]	FAN Y H , WAN C H , FU C ,et al. VDoTR:vulnerability detection based on tensor representation of comprehensive code graphs[J]. Computers ＆ Security, 2023,130: 103247-103259.
[23]	CHANDRA T , SEUNG I J , MUHAMMAD E A ,et al. Transformer-based language models for software vulnerability detection[C]// Proceedings of the 38th Annual Computer Security Applications Conference. New York:ACM Press, 2022: 481-496.
[24]	LI Z , ZOU D Q , XU S H ,et al. VulDeePecker:a deep learning-based system for vulnerability detection[J]. arXiv Preprint,arXiv:1801.01681, 2018.
[25]	LI Z , ZOU D Q , XU S H ,et al. SySeVR:a framework for using deep learning to detect software vulnerabilities[J]. IEEE Transactions on Dependable and Secure Computing, 2022,19(4): 2244-2258.
[26]	杨宏宇, 杨海云, 张良 ,等. 基于特征依赖图的源代码漏洞检测方法[J]. 通信学报, 2023,44(1): 103-117.
	YANG H Y , YANG H Y , ZHANG L ,et al. Feature dependence graph based source code loophole detection method[J]. Journal on Communications, 2023,44(1): 103-117.
[27]	胡雨涛, 王溯远, 吴月明 ,等. 基于图神经网络的切片级漏洞检测及解释方法[J]. 软件学报, 2023,34(6): 2543-2561.
	HU Y T , WANG S Y , WU Y M ,et al. Slice-level vulnerability detection and interpretation method based on graph neural network[J]. Journal of Software, 2023,34(6): 2543-2561.
[28]	ZOU D Q , WANG S J , XU S H ,et al. μVulDeePecker:a deep learning-based system for multiclass vulnerability detection[J]. IEEE Transactions on Dependable and Secure Computing, 2021,18(5): 2224-2236.
[29]	AGRAWAL A , MENZIES T . Is “better data” better than “better data miners”? on the benefits of tuning smote for defect prediction[C]// Proceedings of the 40th International Joint Conference on Software Engineering. New York:ACM Press, 2018: 1050-1061.
[30]	FENG Z Y , GUO D Y , TANG D Y ,et al. CodeBERT:a pre-trained model for programming and natural languages[C]// Proceedings of Findings of the Association for Computational Linguistics. Stroudsburg:ACL Press, 2020: 1536-1547.
[31]	邓枭, 叶蔚, 谢睿 ,等. 基于深度学习的源代码缺陷检测研究综述[J]. 软件学报, 2023,34(2): 625-654.
	DENG X , YE W , XIE R ,et al. Survey of source code bug detection based on deep learning[J]. Journal of Software, 2023,34(2): 625-654.
[32]	CHAKRABORTY S , KRISHNA R , DING Y ,et al. Deep learning based vulnerability detection:are we there yet?[J]. IEEE Transactions on Software Engineering, 2022,48(9): 3280-3296.

Metrics

Recommended 0

No Suggested Reading articles found!

Label	CWE-Id	漏洞类型	数量/个
1	CWE-404	不正确的资源关闭或释放	248
2	CWE-476	空指针解引用	270
3	CWE-119	缓冲区错误	2849
4	CWE-706	消息或数据结构执行不当	167
5	CWE-665	不正确的初始化	289
6	CWE-074	注入	626
7	CWE-704	不正确的类型转换	840
8	CWE-311	敏感数据缺失加密	118
9	CWE-400	不受控制的资源消耗	568
10	CWE-020	输入验证	111

数据集	样本有漏洞/个	样本无漏洞/个	样本总量/个
FC	13 603	50 800	64 403
AE	3 475	18 679	22 154
AU	10 926	31 303	42 229
PU	28 396	263 496	291 892
Devign	11 854	14 124	25 978
REVEAL	2 098	20 050	22 148

mVulSniffer: a multi-type source code vulnerability sniffer method

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 8

References 32

Related Articles 15

Metrics

Recommended 0

[1]	Zhiyuan LI, Binglei XU, Yingyi ZHOU. Graph neural network-based address classification method for account balance model blockchain [J]. Journal on Communications, 2023, 44(9): 115-126.
[2]	Mian LI, Yang LI, Zonghui ZHANG, Qingjiang SHI. Communication-efficient distributed precoding design for Massive MIMO [J]. Journal on Communications, 2023, 44(8): 37-48.
[3]	Huijiao WANG, Xin ZHANG, Yongzhuang WEI, Lingchen LI. Novel distinguisher for SM4 cipher algorithm based on deep learning [J]. Journal on Communications, 2023, 44(7): 171-184.
[4]	Rongpeng LI, Bingyan WANG, Honggang ZHANG, Zhifeng ZHAO. Design of knowledge enhanced semantic communication receiver [J]. Journal on Communications, 2023, 44(6): 70-76.
[5]	Dongyu CHEN, Hua CHEN, Limin FAN, Yifang FU, Jian WANG. Research on test strategy for randomness based on deep learning [J]. Journal on Communications, 2023, 44(6): 23-33.
[6]	Shuai MA, Ke PEI, Huayan QI, Hang LI, Wen CAO, Hongmei WANG, Hailiang XIONG, Shiyin LI. Research on geomagnetic indoor high-precision positioning algorithm based on generative model [J]. Journal on Communications, 2023, 44(6): 211-222.
[7]	Jie YANG, Biao DONG, Xue FU, Yu WANG, Guan GUI. Lightweight decentralized learning-based automatic modulation classification method [J]. Journal on Communications, 2022, 43(7): 134-142.
[8]	Xiuzhang YANG, Guojun PENG, Zichuan LI, Yangqi LYU, Side LIU, Chenguang LI. Research on entity recognition and alignment of APT attack based on Bert and BiLSTM-CRF [J]. Journal on Communications, 2022, 43(6): 58-70.
[9]	Yurong LIAO, Haining WANG, Cunbao LIN, Yang LI, Yuqiang FANG, Shuyan NI. Research progress of deep learning-based object detection of optical remote sensing image [J]. Journal on Communications, 2022, 43(5): 190-203.
[10]	Yong LIAO, Shiyi WANG. CSI feedback algorithm based on RM-Net for massive MIMO systems in high-speed mobile environment [J]. Journal on Communications, 2022, 43(5): 166-176.
[11]	Zenghua ZHAO, Yuefan TONG, Jiayang CUI. Device-independent Wi-Fi fingerprinting indoor localization model based on domain adaptation [J]. Journal on Communications, 2022, 43(4): 143-153.
[12]	Hailin FENG, Xiao ZHANG, Tongcun LIU. Recommendation model combining review’s feature and rating graph convolutional representation [J]. Journal on Communications, 2022, 43(3): 164-171.
[13]	Yong LIAO, Gang CHENG, Yujie LI. CSI feedback algorithm based on deep unfolding for massive MIMO systems [J]. Journal on Communications, 2022, 43(12): 77-88.
[14]	Zhuo CHEN, Miao ZHU, Junwei DU. Multi-view graph neural network for fraud detection algorithm [J]. Journal on Communications, 2022, 43(11): 225-232.
[15]	Junyan HUO, Ruipeng QIU, Yanzhuo MA, Fuzheng YANG. Reference frame list optimization algorithm in video coding by quality enhancement of the nearest picture [J]. Journal on Communications, 2022, 43(11): 136-147.