基于特征依赖图的源代码漏洞检测方法

doi:10.11959/j.issn.1000-436x.2023018

Abstract

Abstract:

Given the problem that the existing source code loophole detection methods did not explicitly maintain the semantic information related to the loophole in the source code, which led to the difficulty of feature extraction of loo-phole statements and the high false positive rate of loophole detection, a source code loophole detection method based on feature dependency graph was proposed.First, extracted the candidate loophole statements in the function slice, and gen-erated the feature dependency graph by analyzing the control dependency chain and data dependency chain of the candi-date loophole statements.Secondly, the word vector model was used to generate the initial node representation vector of the feature dependency graph.Finally, a loophole detection neural network oriented to feature dependence graph was constructed, in which the graph learning network learned the heterogeneous neighbor node information of the feature de-pendency graph and the detection network extracted global features and performed loophole detection.The experimental results show that the recall rate and F1 score of the proposed method are improved by 1.50%～22.32% and 1.86%～16.69% respectively, which is superior to the existing method.

Key words: source code, loophole detection, semantic information, dependence graph, neural network

CLC Number:

TP393

Hongyu YANG, Haiyun YANG, Liang ZHANG, Xiang CHENG. Feature dependence graph based source code loophole detection method[J]. Journal on Communications, 2023, 44(1): 103-117.

Figures/Tables 21

方法	CWE20	CWE78	CWE129	CWE190	CWE400	CWE787	CWE789
Russell	72.45%	92.71%	85.97%	87.49%	88.58%	74.42%	89.24%
VulDeePecker	75.19%	95.71%	83.31%	86.97%	91.26%	75.62%	89.85%
μVulDeePecker	75.93%	95.95%	89.41%	87.52%	93.71%	82.38%	89.59%
SySeVR	78.58%	97.01%	90.25%	93.00%	94.59%	85.69%	91.90%
VulDeeLocator	76.68%	97.06%	91.37%	93.39%	95.28%	84.65%	90.94%
Devign	77.93%	96.66%	90.19%	95.53%	96.47%	85.09%	91.97%
Reveal	78.93%	97.28%	93.49%	97.32%	$97 . 99 %$	85.58%	92.46%
$F B L D$	$81 . 42 %$	$98 . 37 %$	$96 . 24 %$	$98 . 17 %$	97.93%	$87 . 59 %$	$94 . 72 %$

方法	CWE20	CWE78	CWE129	CWE190	CWE400	CWE787	CWE789
Russell	38.18%	87.01%	79.88%	85.12%	77.78%	53.44%	67.43%
VulDeePecker	42.86%	$95 . 96 %$	75.61%	82.53%	83.72%	55.56%	68.93%
μVulDeePecker	44.98%	88.31%	84.61%	75.21%	85.57%	$71 . 73 %$	66.27%
SySeVR	49.36%	92.94%	83.33%	90.16%	88.64%	66.75%	75.37%
VulDeeLocator	46.35%	92.44%	85.56%	85.45%	88.95%	66.07%	69.11%
Devign	48.18%	91.83%	83.10%	94.19%	92.59%	65.82%	75.55%
Reveal	50.00%	93.02%	84.77%	94.89%	$93 . 35 %$	66.62%	76.60%
$F B L D$	$54 . 55 %$	94.31%	$90 . 63 %$	$97 . 00 %$	93.33%	69.23%	$83 . 92 %$

方法	CWE20	CWE78	CWE129	CWE190	CWE400	CWE787	CWE789
Russell	49.65%	79.76%	70.54%	57.83%	83.14%	51.02%	56.41%
VulDeePecker	53.19%	84.76%	67.18%	57.76%	85.51%	54.66%	60.44%
μVulDeePecker	62.31%	94.79%	78.67%	71.36%	92.67%	61.59%	64.14%
SySeVR	63.83%	94.04%	83.98%	79.42%	92.64%	95.12%	69.30%
VulDeeLocator	65.84%	94.88%	85.39%	87.25%	94.50%	93.28%	73.01%
Devign	62.65%	93.67%	84.08%	86.64%	95.04%	94.75%	69.70%
Reveal	65.01%	95.23%	95.20%	93.86%	$99 . 98 %$	94.90%	72.52%
$F B L D$	$70 . 92 %$	$98 . 80 %$	$97 . 41 %$	$95 . 27 %$	99.76%	$98 . 40 %$	$80 . 74 %$

方法	CWE20	CWE78	CWE129	CWE190	CWE400	CWE787	CWE789
Russell	43.17%	83.23%	74.92%	68.87%	80.37%	52.20%	61.43%
VulDeePecker	47.47%	90.01%	71.15%	67.96%	84.61%	55.11%	64.40%
μVulDeePecker	52.25%	91.43%	81.53%	73.24%	88.98%	66.27%	65.19%
SySeVR	55.67%	93.49%	83.66%	84.45%	90.59%	78.45%	72.21%
VulDeeLocator	54.40%	93.64%	85.47%	86.34%	91.64%	77.35%	71.00%
Devign	54.47%	92.74%	83.59%	90.26%	93.79%	77.68%	72.51%
Reveal	56.53%	94.12%	89.68%	94.37%	$96 . 55 %$	78.28%	74.50%
$F B L D$	$61 . 67 %$	$96 . 51 %$	$93 . 89 %$	$96 . 13 %$	96.44%	$81 . 27 %$	$82 . 30 %$

方法	训练时间/min	平均检测时间/s
Russel	18	0.95
VulDeePecker	24	1.24
μVuDeePecker	29	1.78
SySeVR	36	2.25
VulDeeLocator	32	2.12
Devign	49	3.40
Reveal	56	3.51
$F B L D$	87	4.89

References 18

[1]	LIN G J , WEN S , HAN Q L ,et al. Software vulnerability detection using deep neural networks:a survey[J]. Proceedings of the IEEE, 2020,108(10): 1825-1848.
[2]	MIAO Y T , CHEN C , PAN L ,et al. Machine learning-based cyber attacks targeting on controlled information[J]. ACM Computing Surveys, 2022,54(7): 1-36.
[3]	LI Z , ZOU D Q , XU S H ,et al. SySeVR:a framework for using deep learning to detect software vulnerabilities[J]. IEEE Transactions on Dependable and Secure Computing, 2022,19(4): 2244-2258.
[4]	ZHANG J , PAN L , HAN Q L ,et al. Deep learning based attack detection for cyber-physical system cybersecurity:a survey[J]. IEEE/CAA Journal of Automatica Sinica, 2022,9(3): 377-391.
[5]	QIU J Y , ZHANG J , LUO W ,et al. A survey of android malware detection with deep neural models[J]. ACM Computing Surveys, 2021,53(6): 1-36.
[6]	WANG H T , YE G X , TANG Z Y ,et al. Combining graph-based learning with automated data collection for code vulnerability detection[J]. IEEE Transactions on Information Forensics and Security, 2021,16: 1943-1958.
[7]	CHAKRABORTY S , KRISHNA R , DING Y ,et al. Deep learning based vulnerability detection:are we there yet?[J]. IEEE Transactions on Software Engineering, 2022,48(9): 3280-3296.
[8]	YAMAGUCHI F , GOLDE N , ARP D ,et al. Modeling and discovering vulnerabilities with code property graphs[C]// Proceedings of 2014 IEEE Symposium on Security and Privacy. Piscataway:IEEE Press, 2014: 590-604.
[9]	RUSSELL R , KIM L , HAMILTON L ,et al. Automated vulnerability detection in source code using deep representation learning[C]// Proceedings of 2018 17th IEEE International Conference on Machine Learning and Applications. Piscataway:IEEE Press, 2018: 757-762.
[10]	LI Z , ZOU D Q , XU S H ,et al. VulDeePecker:a deep learning-based system for vulnerability detection[J]. arXiv Preprint,arXiv:1801.01681, 2018.
[11]	ZOU D Q , WANG S J , XU S H ,et al. μVulDeePecker:a deep learning-based system for multiclass vulnerability detection[J]. IEEE Transactions on Dependable and Secure Computing, 2021,18(5): 2224-2236.
[12]	LI Z , ZOU D Q , XU S H ,et al. VulDeeLocator:a deep learning-based fine-grained vulnerability detector[J]. IEEE Transactions on Dependable and Secure Computing, 2022,19(4): 2821-2837.
[13]	ZHOU Y , LIU S , SIOW J ,et al. Devign:effective vulnerability identification by learning comprehensive program semantics via graph neural networks[J]. Advances in neural information processing systems, 2019,32(1): 10197-10207.
[14]	LI Y J , TARLOW D , BROCKSCHMIDT M ,et al. Gated graph sequence neural networks[J]. arXiv Preprint,arXiv:1511.05493, 2015.
[15]	ALLAMANIS M , BROCKSCHMIDT M , KHADEMI M . Learning to represent programs with graphs[J]. arXiv Preprint,arXiv:1711.00740, 2017.
[16]	WU Z H , PAN S R , CHEN F W ,et al. A comprehensive survey on graph neural networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021,32(1): 4-24.
[17]	HERREMANS D , CHUAN C H . Modeling musical context with Word2Vec[J]. arXiv Preprint,arXiv:1706.09088, 2017.
[18]	HU Z N , DONG Y X , WANG K S ,et al. Heterogeneous graph transformer[C]// Proceedings of The Web Conference 2020. New York:ACM Press, 2020: 2704-2710.

Metrics

Recommended 0

No Suggested Reading articles found!

类型	名称	函数片/个	FDG/个	有漏洞的FDG/个	良性FDG/个
CWE20	不合适的输入验证	3 452	4 015	846	3 169
CWE78	命令注入	1 7000	18 420	4 200	14 220
CWE129	数组索引验证不当	11 208	10 019	2 977	7 042
CWE190	整数上溢	25 913	28 943	6 925	22 018
CWE400	资源耗尽	9 990	10 023	2 744	7 279
CWE787	越界写入	14 797	14 980	4 210	10 770
CWE789	分配失控	7 300	8 167	1 241	6 926

数据集名称	函数片/个	FDG/个	有漏洞的FDG/个	良性FDG/个
Devign	27 313	28 760	13 754	15 006
Reveal	22 725	20 109	2 753	17 356

名称	最小值	中位数	75分位数	最大值
节点数量/个	4	13	21	306
边数量/条	6	65	132	6 307
消耗时间/s	0.001	2.53	14.19	78.96

Feature dependence graph based source code loophole detection method

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 21

References 18

Related Articles 15

Metrics

Recommended 0

[1]	Ping ZHANG, Kai NIU, Shengshi YAO, Jincheng DAI. Semantic communications for future: basic principle and implementation methodology [J]. Journal on Communications, 2023, 44(5): 1-14.
[2]	Ying FANG, Yiwen XU, Tiesong ZHAO. Joint vibrotactile coding for machine recognition and human perception [J]. Journal on Communications, 2023, 44(5): 42-51.
[3]	Jinyin CHEN, Haiyang XIONG, Haonan MA, Yayu ZHENG. CLB-Defense: based on contrastive learning defense for graph neural network against backdoor attack [J]. Journal on Communications, 2023, 44(4): 154-166.
[4]	Jianfeng LI, Zheyu LIU, Yang RONG, Zhan LI, Bolin LIAO, Linxi QU, Zhijie LIU, Kunhuang LIN. Zeroing neural network for time-varying convex quadratic programming with linear noise [J]. Journal on Communications, 2023, 44(4): 226-233.
[5]	Yun LIN, Huaitao XU, Sen WANG, Sicheng ZHANG, Long ZHUANG. Objective assessment of communication speech interference effect based on feature fusion [J]. Journal on Communications, 2023, 44(3): 105-116.
[6]	Shiwen HE, Jun YUAN, Zhenyu AN, Min ZHANG, Yongming HUANG, Yaoxue ZHANG. GNN-based optimization algorithm for joint user scheduling and beamforming [J]. Journal on Communications, 2022, 43(7): 73-84.
[7]	Tao LENG, Lijun CAI, Aimin YU, Ziyuan ZHU, Jian’gang MA, Chaofei LI, Ruicheng NIU, Dan MENG. Review of threat discovery and forensic analysis based on system provenance graph [J]. Journal on Communications, 2022, 43(7): 172-188.
[8]	Yurong LIAO, Haining WANG, Cunbao LIN, Yang LI, Yuqiang FANG, Shuyan NI. Research progress of deep learning-based object detection of optical remote sensing image [J]. Journal on Communications, 2022, 43(5): 190-203.
[9]	Fan ZHANG, Yun HUANG, Zizhuo FANG, Wei GUO. Lost-minimum post-training parameter quantization method for convolutional neural network [J]. Journal on Communications, 2022, 43(4): 114-122.
[10]	Zhengyu ZHU, Gengwang HOU, Chongwen HUANG, Gangcan SUN, Wanming HAO, Jing LIANG. Systems resource allocation algorithm for RIS-assisted D2D secure communication based on parallel CNN [J]. Journal on Communications, 2022, 43(3): 172-179.
[11]	Junyan HUO, Danni WANG, Yanzhuo MA, Shuai WAN, Fuzheng YANG. Efficient cross-component prediction for H.266/VVC based on lightweight fully connected networks [J]. Journal on Communications, 2022, 43(2): 143-155.
[12]	Hua LONG, Zhangheng HUANG, Yubin SHAO, Qingzhi DU, Shumeng SU. Research on language recognition algorithm based on improved CFCC feature extraction [J]. Journal on Communications, 2022, 43(12): 211-221.
[13]	Zhengyu ZHU, Pengfei CHEN, Zixuan WANG, Kexian GONG, Di WU, Zhongyong WANG. Short wave protocol signals recognition based on Swin-Transformer [J]. Journal on Communications, 2022, 43(11): 127-135.
[14]	Jinbo XIONG, Yongjie ZHOU, Renwan BI, Liang WAN, Youliang TIAN. Towards edge-collaborative, lightweight and privacy-preserving classification framework [J]. Journal on Communications, 2022, 43(1): 127-137.
[15]	Yiteng WU, Wei LIU, Hongtao YU. Label flipping adversarial attack on graph neural network [J]. Journal on Communications, 2021, 42(9): 65-74.