网络与信息安全学报 ›› 2023, Vol. 9 ›› Issue (6): 86-101.doi: 10.11959/j.issn.2096-109x.2023085
• 学术论文 • 上一篇
李妍1,2,3, 羌卫中1,2,3, 李珍1,2,3, 邹德清1,2,3, 金海1,4
修回日期:
2023-07-28
出版日期:
2023-12-01
发布日期:
2023-12-01
作者简介:
李妍(1998- ),女,陕西渭南人,华中科技大学硕士生,主要研究方向为深度学习和漏洞检测基金资助:
Yan LI1,2,3, Weizhong QIANG1,2,3, Zhen LI1,2,3, Deqing ZOU1,2,3, Hai JIN1,4
Revised:
2023-07-28
Online:
2023-12-01
Published:
2023-12-01
Supported by:
摘要:
近年来,软件漏洞引发的安全事件层出不穷,及早发现并修补漏洞能够有效降低损失。传统的基于规则的漏洞检测方法依赖于专家定义规则,存在较高的漏报率,基于深度学习的方法能够自动学习漏洞程序的潜在特征,然而随着软件复杂程度的提升,该类方法在面对真实软件时存在精度下降的问题。一方面,现有方法执行漏洞检测时大多在函数级工作,无法处理跨函数的漏洞样例;另一方面,BGRU和BLSTM等模型在输入序列过长时性能下降,不善于捕捉程序语句间的长期依赖关系。针对上述问题,优化了现有的程序切片方法,结合过程内和过程间切片对跨函数的漏洞进行全面的上下文分析以捕获漏洞触发的完整因果关系;应用了包含多头注意力机制的 Transformer 神经网络模型执行漏洞检测任务,共同关注来自不同表示子空间的信息来提取节点的深层特征,相较于循环神经网络解决了信息衰减的问题,能够更有效地学习源程序的语法和语义信息。实验结果表明,该方法在真实软件数据集上的 F1 分数达到了 73.4%,相较于对比方法提升了13.6%~40.8%,并成功检测出多个开源软件漏洞,证明了其有效性与实用性。
中图分类号:
李妍, 羌卫中, 李珍, 邹德清, 金海. 基于程序过程间语义优化的深度学习漏洞检测方法[J]. 网络与信息安全学报, 2023, 9(6): 86-101.
Yan LI, Weizhong QIANG, Zhen LI, Deqing ZOU, Hai JIN. Deep learning vulnerability detection method based on optimized inter-procedural semantics of programs[J]. Chinese Journal of Network and Information Security, 2023, 9(6): 86-101.
表1
漏洞类型描述Table 1 Description of vulnerability type"
CWE-ID | 漏洞类型描述 |
CWE-119 | 缓冲区溢出,内存缓冲区操作不当,读取或写入缓冲区预期边界之外的内存位置 |
CWE-125 | 越界读,软件读取的数据超过预期缓冲区的末尾或开头 |
CWE-787 | 越界写,软件将数据写入预期缓冲区的末尾或开头之前 |
CWE-189 | 数值错误,与不正确的数值计算或转换相关 |
CWE-190 | 整数溢出或环绕,当逻辑假定结果值总是大于原始值时,软件执行的计算可能会产生整数溢出或环绕。当计算结果用于资源管理或执行控制时,可能会引入其他漏洞 |
CWE-20 | 输入校验不当,软件接收输入或数据,但未验证或错误地验证输入是否安全或是否被正确处理,攻击者可能利用该漏洞修改控制流、控制任意资源和执行任意代码 |
CWE-369 | 除零错误,经常出现在涉及长度、宽度和高度等物理尺寸的计算中,可能导致系统崩溃 |
CWE-415 | Double Free,在同一内存地址上两次调用free(),可能导致修改意外的内存位置 |
CWE-416 | Use After Free,释放内存后引用内存,可能导致程序崩溃、使用意外值或代码执行 |
CWE-476 | 空指针解引用,程序解引用预期有效但实际为NULL的指针,通常会导致崩溃或退出 |
表3
与漏洞类型相关的关键变量和sink语句特征Table 3 Feature of critical variables and sink statement related to vulnerability type"
漏洞类型 | CWE-ID | 关键变量与sink语句特征 |
CWE-119 | ||
内存操作不当 | CWE-125 | 内存敏感的API,如malloc、memcpy、memset等,关键变量为内存大小相关的参数;数组使用,关键变量 |
CWE-787 | 为数组下标;指针使用,关键变量为指针本身或与指针运算相关的参数 | |
CWE-20 | ||
CWE-189 | ||
数值运算不当 | CWE-190 | 整数运算,关键变量为参与整数运算的参数,可能进一步导致缓冲区溢出漏洞 |
CWE-369 | 除法运算或者模运算,关键变量为除数 | |
CWE-415 | double free,关键变量为指针,漏洞在第二次调用free/delete重复释放内存时被触发 | |
指针使用不当 | CWE-416 | use after free,关键变量为释放内存后的空指针,漏洞在其调用free/delete释放内存之后再次使用时被触发 |
CWE-476 | 空指针解引用,关键变量为未初始化或被赋值为NULL的指针,如未初始化的结构体、函数指针等,漏洞在其首次使用时被触发 |
表10
通过人工分析验证的漏洞列表Table 10 List of vulnerabilities found in openGauss and verified by manual analysis"
漏洞类型 | 漏洞文件路径 | 漏洞成因分析 |
CWE-476 | src/storage/gstor/zekernel/kernel/table/*.c | 置空指针可能引起空指针解引用风险,需对其进行初始化 |
CWE-415 | src/gausskernel/optimizer/commands/*.cpp | 存在double free安全隐患,在释放指针后应及时将其置为NULL |
CWE-369 | src/common/backend/utils/adt/*.cpp | 被除数被初始化为0,但在除以该变量时未判断其是否为零 |
[1] | CVE[EB]. |
[2] | SKYBOX SECURITY[EB]. |
[3] | WU T M , WEN S , XIANG Y ,et al. Twitter spam detection:survey of new approaches and comparative study[J]. Computers & Security, 2018,76(7): 265-284. |
[4] | LECUN Y , BENGIO Y , HINTON G . Deep learning[J]. Nature, 2015,521(7553): 436-444. |
[5] | LI Z , ZOU D Q , XU S H ,et al. VulDeePecker:a deep learning-based system for vulnerability detection[C]// The Network and Distributed System Security Symposium (NDSS). 2018: 18-21. |
[6] | CHAKRABORTY S , KRISHNA R , DING Y ,et al. Deep learning based vulnerability detection:are we there yet[J]. IEEE Transactions on Software Engineering (TSE), 2021,48(9): 3280-3296. |
[7] | DAM H K , TRAN T , PHAM T ,et al. Automatic feature learning for predicting vulnerable software components[J]. IEEE Transactions on Software Engineering (TSE), 2018,47(1): 67-85. |
[8] | LI Z , ZOU D Q , XU S H ,et al. SySeVR:a framework for using deep learning to detect software vulnerabilities[J]. IEEE Transactions on Dependable and Secure Computing (TDSC), 2021,19(4): 2244-2258. |
[9] | XIAO Y , CHEN B H , YU C D ,et al. MVP:detecting vulnerabilities using patch-enhanced vulnerability signatures[C]// Proceedings of USENIX Security Symposium. 2020. |
[10] | LIN G J , WEN S , HAN Q L ,et al. Software vulnerability detection using deep neural networks:a survey[J]. Proceedings of the IEEE, 2020,108(10): 1825-1848. |
[11] | LIN G J , ZHANG J , LUO W ,et al. POSTER:vulnerability discovery with function representation learning from unlabeled projects[C]// Proceedings of the Conference on Computer and Communications Security (CCS). 2017. |
[12] | ZHOU Y Q , LIU S Q , SIOW J ,et al. Devign:effective vulnerability identification by learning comprehensive program semantics via graph neural networks[C]// Proceedings of Annual Conference on Neural Information Processing Systems (NeurIPS). 2019. |
[13] | VASWANI A , SHAZEER N , PARMAR N ,et al. Attention is all you need[C]// Proceedings of Advances in Neural Information Processing Systems (NIPS). 2017. |
[14] | GHAFFARIAN S M , SHAHRIARI H R . Software vulnerability analysis and discovery using machine-learning and data-mining techniques[J]. ACM Computing Surveys (CSUR), 2017,50(4): 1-36. |
[15] | ACHARYA M . Mining API patterns as partial orders from source code:from usage scenarios to specifications[C]// Proceedings of Joint Meeting of the European Software Engineering Conference &the ACM Sigsoft Symposium on the Foundations of Software Engineering. 2007. |
[16] | SCANDARIATO R , WALDEN J , HOVSEPYAN A ,et al. Predicting vulnerable software components via text mining[J]. IEEE Transactions on Software Engineering (TSE), 2014,40(10): 993-1006. |
[17] | YAMAGUCHI F , WRESSNEGGER C , GASCON H ,et al. Chucky:exposing missing checks in source code for vulnerability discovery[C]// Proceedings of the Conference on Computer & Communications Security (CCS). 2013. |
[18] | WHITE M , VENDOME C , LINARES-VASQUEZ M ,et al. Toward deep learning software repositories[C]// Proceedings of IEEE/ACM Working Conference on Mining Software Repositories. 2015. |
[19] | WANG S , LIU T Y , TAN L . Automatically learning semantic features for defect prediction[C]// Proceedings of IEEE/ACM 38th International Conference on Software Engineering (ICSE). 2016. |
[20] | YAMAGUCHI F , GOLDE N , ARP D ,et al. Modeling and discovering vulnerabilities with code property graphs[C]// IEEE Symposium on Security and Privacy (S&P). 2014. |
[21] | SHAR L K , BRIAND L , TAN H . Web application vulnerability prediction using hybrid program analysis and machine learning[J]. IEEE Transactions on Dependable and Secure Computing (TDSC), 2015,12(6): 688-707. |
[22] | LIN G J , ZHANG J , LUO W ,et al. Cross-project transfer representation learning for vulnerable function discovery[J]. IEEE Transactions on Industrial Informatics, 2018,14(7): 3289-3297. |
[23] | LIN G J , XIAO W , ZHANG J ,et al. Deep Learning-Based Vulnerable Function Detection:A Benchmark[C]// Proceedings of International Conference on Information and Communications Security (ICICS). 2019. |
[24] | 李韵, 黄辰林, 王中锋 ,等. 基于机器学习的软件漏洞挖掘方法综述[J]. 软件学报, 2020,31(7): 2040-2061. |
LI Y , HUANG C L , WANG Z F ,et al. Survey of software vulnerability mining methods based on machine learning[J]. Journal of Software. 2020,31(7): 2040-2061. | |
[25] | WANG H T , YE G X , TANG Z Y ,et al. Combining graph-based learning with automated data collection for code vulnerability detection[J]. IEEE Transactions on Dependable and Secure Computing (TIFS), 2020,16: 1943-1958. |
[26] | LI Y , WANG S H , NGUYEN T N . Vulnerability detection with fine-grained interpretations[C]// Proceedings of ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 2021. |
[27] | YING R , BOURGEOIS D , YOU J X ,et al. GNNExplainer:generating explanations for graph neural networks[C]// Proceedings of International Conference on Neural Information Processing Systems(NIPS). 2019. |
[28] | SONNEKALB T . Machine-learning supported vulnerability detection in source code[C]// Proceedings of ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 2019. |
[29] | RUSSELL R L , KIM L , LEI H H ,et al. Automated vulnerability detection in source code using deep representation learning[C]// 2018 IEEE 17th International Conference on Machine Learning and Applications (ICMLA). 2018. |
[30] | CHENG X , WANG H Y , HUA J Y ,et al. DeepWukong:statically detecting software vulnerabilities using deep graph neural network[J]. ACM Transactions on Software Engineering and Methodology (TOSEM 2021), 2021,30(3): 1-33. |
[31] | 段旭, 吴敬征, 罗天悦 ,等. 基于代码属性图及注意力双向LSTM的漏洞挖掘方法[J]. 软件学报, 2020,31(11): 3404-3420. |
DUAN X , WU J Z , LUO T Y ,et al. Vulnerability mining method based on code property graph and attention BiLSTM[J]. Journal of Software, 2020,31(11): 3404-3420. | |
[32] | 陈肇炫, 邹德清, 李珍 ,等. 基于抽象语法树的智能化漏洞检测系统[J]. 信息安全学报, 2020,5(4): 1-13. |
CHEN Z X , ZOU D Q , LI Z ,et al. Intelligent vulnerability detection system based on abstract syntax tree[J]. Journal of Cyber Security. 2020,5(4): 1-13. | |
[33] | 胡雨涛, 王溯远, 吴月明 ,等. 基于图神经网络的切片级漏洞检测及解释方法[J]. 软件学报, 2023,34(6): 65-82. |
HU Y T , WANG S Y , WU Y M ,et al. A Slice-level vulnerability detection and interpretation method based on graph neural network[J]. Journal of Software, 2023,34(6): 65-82. | |
[34] | Joern[EB] |
[35] | Neo4j graph platform[EB] |
[36] | MIKOLOV T , CHEN K , CORRADO G ,et al. Efficient estimation of word representations in vector space[J]. Computer Science, 2013(1). |
[37] | Checkmarx[EB]. |
[38] | Flawfinder[EB]. |
[1] | 王金伟, 陈正嘉, 谢雪, 罗向阳, 马宾. 恶意软件检测和分类可视化技术综述[J]. 网络与信息安全学报, 2023, 9(5): 1-20. |
[2] | 张博林, 朱春陶, 殷琪林, 付婧巧, 刘凌毅, 刘佳睿, 刘红梅, 卢伟. 基于噪声注意力的伪造人脸检测方法[J]. 网络与信息安全学报, 2023, 9(4): 155-165. |
[3] | 李晓萌, 郭玳豆, 卓训方, 姚恒, 秦川. 载体独立的抗屏摄信息膜叠加水印算法[J]. 网络与信息安全学报, 2023, 9(3): 135-149. |
[4] | 谢绒娜, 马铸鸿, 李宗俞, 田野. 基于卷积神经网络的加密流量分类方法[J]. 网络与信息安全学报, 2022, 8(6): 84-91. |
[5] | 章登勇, 文凰, 李峰, 曹鹏, 向凌云, 杨高波, 丁湘陵. 基于双分支网络的图像修复取证方法[J]. 网络与信息安全学报, 2022, 8(6): 110-122. |
[6] | 林佳滢, 周文柏, 张卫明, 俞能海. 空域频域相结合的唇型篡改检测方法[J]. 网络与信息安全学报, 2022, 8(6): 146-155. |
[7] | 穆超, 王鑫, 杨明, 张恒, 陈振娅, 吴晓明. 面向物联网设备固件的硬编码漏洞检测方法[J]. 网络与信息安全学报, 2022, 8(5): 98-110. |
[8] | 高凡, 王健, 刘吉强. 基于动态浏览器指纹的链接检测技术研究[J]. 网络与信息安全学报, 2022, 8(4): 144-156. |
[9] | 陈晋音, 吴长安, 郑海斌. 基于softmax激活变换的对抗防御方法[J]. 网络与信息安全学报, 2022, 8(2): 48-63. |
[10] | 邱宝琳, 易平. 基于多维特征图知识蒸馏的对抗样本防御方法[J]. 网络与信息安全学报, 2022, 8(2): 88-99. |
[11] | 胡向东, 田正国. 融合注意力机制和BSRU的工业互联网安全态势预测方法[J]. 网络与信息安全学报, 2022, 8(1): 41-51. |
[12] | 李丽娟, 李曼, 毕红军, 周华春. 基于混合深度学习的多类型低速率DDoS攻击检测方法[J]. 网络与信息安全学报, 2022, 8(1): 73-85. |
[13] | 秦中元, 贺兆祥, 李涛, 陈立全. 基于图像重构的MNIST对抗样本防御算法[J]. 网络与信息安全学报, 2022, 8(1): 86-94. |
[14] | 邹德清, 李响, 黄敏桓, 宋翔, 李浩, 李伟明. 基于图结构源代码切片的智能化漏洞检测系统[J]. 网络与信息安全学报, 2021, 7(5): 113-122. |
[15] | 王正龙, 张保稳. 生成对抗网络研究综述[J]. 网络与信息安全学报, 2021, 7(4): 68-85. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
|