面向源代码的软件漏洞静态检测综述

doi:10.11959/j.issn.2096-109x.2019001

网络与信息安全学报 ›› 2019, Vol. 5 ›› Issue (1): 1-14.doi: 10.11959/j.issn.2096-109x.2019001

• 综述 • 下一篇

面向源代码的软件漏洞静态检测综述

李珍^1,^2,^3,⁴,邹德清^1,^2,^3,^4,⁵(),王泽丽^1,^2,^3,⁴,金海^1,^2,^3,⁴

¹ 华中科技大学计算机科学与技术学院，湖北武汉 430074
² 华中科技大学服务计算技术与系统教育部重点实验室，湖北武汉 430074
³ 华中科技大学集群与网格计算湖北省重点实验室，湖北武汉 430074
⁴ 深圳华中科技大学研究院，广东深圳 518057
⁵ 深圳华中科技大学研究院，广东深圳 518057

修回日期:2018-12-26 出版日期:2019-02-01 发布日期:2019-04-10
作者简介:李珍（1981- ），女，河北保定人，华中科技大学博士生，主要研究方向为软件安全、漏洞检测。|邹德清（1975- ），男，湖南湘潭人，华中科技大学教授、博士生导师，主要研究方向为云计算安全、网络攻防与漏洞检测、软件定义安全与主动防御、大数据安全与人工智能安全、容错计算。|王泽丽（1995- ），女，湖北襄阳人，华中科技大学博士生，主要研究方向为区块链系统安全、智能合约安全。|金海（1966- ），男，上海人，华中科技大学教授、博士生导师，主要研究方向为计算机系统结构、虚拟化技术、集群计算、网格计算、并行与分布式计算、对等计算、普适计算、语义网、存储与安全。
基金资助:
科技部“网络空间安全”重点专项基金资助项目(2017YFB0802205);国家自然科学基金资助项目(61672249);深圳市基础研究（学科布局）基金资助项目(JCYJ20170413114215614)

Survey on static software vulnerability detection for source code

Zhen LI^1,^2,^3,⁴,Deqing ZOU^1,^2,^3,^4,⁵(),Zeli WANG^1,^2,^3,⁴,Hai JIN^1,^2,^3,⁴

¹ School of Computer Science and Technology,Huazhong University of Science and Technology,Wuhan 430074,China
² Services Computing Technology and System Lab,Huazhong University of Science and Technology,Wuhan 430074,China
³ Clusters and Grid Computing Lab,Huazhong University of Science and Technology,Wuhan 430074,China
⁴ Shenzhen Huazhong University of Science and Technology Research Institute,Shenzhen 518057,China
⁵ Shenzhen Huazhong University of Science and Technology Research Institute,Shenzhen 518057,China

Revised:2018-12-26 Online:2019-02-01 Published:2019-04-10
Supported by:
The Ministry of Science and Technology’s “Network Space Security” Key Special Project(2017YFB0802205);The National Natural Science Foundation of China(61672249);The Shenzhen Fundamental Research Program(JCYJ20170413114215614)

摘要/Abstract

摘要：

软件静态漏洞检测依据分析对象主要分为二进制漏洞检测和源代码漏洞检测。由于源代码含有更为丰富的语义信息而备受代码审查人员的青睐。针对现有的源代码漏洞检测研究工作，从基于代码相似性的漏洞检测、基于符号执行的漏洞检测、基于规则的漏洞检测以及基于机器学习的漏洞检测4个方面进行了总结，并以基于源代码相似性的漏洞检测系统和面向源代码的软件漏洞智能检测系统两个具体方案为例详细介绍了漏洞检测过程。

关键词: 软件漏洞, 源代码漏洞检测, 代码相似性, 深度学习

Abstract:

Static software vulnerability detection is mainly divided into two types according to different analysis objects:vulnerability detection for binary code and vulnerability detection for source code.Because the source codecontains more semantic information,it is more favored by code auditors.The existing vulnerability detection research works for source code are summarized from four aspects:code similarity-based vulnerability detection,symbolic execution-based vulnerability detection,rule-based vulnerability detection,and machine learning-based vulnerability detection.The vulnerability detection system based on source code similarity and the intelligent software vulnerability detection system for source code are taken as two examples to introduce the process of vulnerability detection in detail.

Key words: software vulnerability, vulnerability detection for source code, code similarity, deep learning

中图分类号:

TP393

李珍, 邹德清, 王泽丽, 金海. 面向源代码的软件漏洞静态检测综述[J]. 网络与信息安全学报, 2019, 5(1): 1-14.

Zhen LI, Deqing ZOU, Zeli WANG, Hai JIN. Survey on static software vulnerability detection for source code[J]. Chinese Journal of Network and Information Security, 2019, 5(1): 1-14.

图/表 4

表1

图1

图2

图3

参考文献 62

[2]	NEUHAUS S , ZIMMERMANN T , HOLLER C ,et al. Predictingvulnerable software components[C]// The 14th ACM Conference on Computer and Communications Security (CCS). 2007: 529-540.
[3]	JANG J , AGRAWAL A , BRUMLEY D . ReDeBug:finding unpatched code clones in entire OS distributions[C]// 2012 IEEE Symposium on Security and Privacy (S＆P). 2012: 48-62.
[4]	LI H , KWON H , KWON J ,et al. A scalable approach for vulnerability discovery based on security patches[C]// International Conference on Applications and Techniques in Information Security (ATIC). 2014: 109-122.
[5]	SCANDARIATO R , WALDEN J , HOVSEPYAN A ,et al. Predicting vulnerable software components via text mining[J]. IEEE Transactions on Software Engineering, 2014,40(10): 993-1006.
[6]	YAMAGUCHI F , LINDNER F , RIECK K . Vulnerability extrapolation:Assisted discovery of vulnerabilities using machine learning[C]// The 5th USENIX Workshop on Offensive Technologies (WOOT). 2011: 118-127.
[7]	LI Z , LU S , MYAGMAR S ,et al. CP-Miner:finding copy-paste and related bugs in large-scale software code[J]. IEEE Transactions on Software Engineering, 2006,32(3): 176-192.
[8]	YAMAGUCHI F , LOTTMANN M , AND RIECK K . Generalized vulnerability extrapolation using abstract syntax trees[C]// The 28th Annual Computer Security Applications Conference (ACSAC). 2012: 359-368.
[9]	PHAM N H , NGUYEN T T , NGUYEN H A ,et al. Detection of recurring software vulnerabilities[C]// The IEEE/ACM International Conference on Automated Software Engineering,Montpellier,France. 2010: 447-456.
[10]	YAMAGUCHI F , GOLDE N , ARP D ,et al. Modeling and discovering vulnerabilities with code property graphs[C]// The IEEE Symposium on Security and Privacy (S＆P). 2014: 590-604.
[11]	LI J , ERNST M D . CBCD:cloned buggy code detector[C]// The 34th International Conference on Software Engineering (ICSE). 2012: 310-320.
[12]	KOMONDOOR R , HORWITZ S . Using slicing to identify duplication in source code[C]// The International Static Analysis Symposium. 2001: 40-56.
[13]	李赞, 边攀, 石文昌 ,等. 一种利用补丁的未知漏洞发现方法[J]. 软件学报, 2018,29(5): 1199-1212.
	LI Z , BIAN P , SHI W C ,et al. Approach of leveraging patches to discover unknown vulnerabilities[J]. Journal of Software, 2018,29(5): 1199-1212.
[14]	KOSCHKE R , FALKE R , FRENZEL P . Clone detection using abstract syntax suffix trees[C]// The 13th Working Conference on Reverse Engineering (WCRE). 2006: 253-262.
[15]	RATTAN D , BHATIA R , SINGH M . Software clone detection:a systematic review[J]. Information and Software Technology, 2013,55(7): 1165-1199.
[16]	LIANG H , WANG L , WU D ,et al. MLSA:a static bugs analysis toolbased on LLVM IR[C]// The 17th IEEE/ACIS International Conference on Software Engineering,Artificial Intelligence,Networking and Parallel/Distributed Computing (SNPD). 2016: 407-412.
[17]	CASSEZ F , SLOANE A M , ROBERTS M ,et al. Skink:static analysis of programs in LLVM intermediate representation (competition contribution)[C]// The 23rd International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS). 2017: 380-384.
[18]	THOME J , SHAR L K , BIANCULLI D ,et al. Search-driven string constraint solving for vulnerability detection[C]// The 39th International Conference on Software Engineering (ICSE). 2017: 198-208.
[19]	沈维军, 汤恩义, 陈振宇 ,等. 数值稳定性相关漏洞隐患的自动化检测方法[J]. 软件学报, 2018,29(5): 1230-1243.
	SHEN W J , TANG E Y , CHEN Z Y ,et al. Method for automated detection of suspicious vulnerability related to numerical stability[J]. Journal of Software, 2018,29(5): 1230-1243.
[20]	王雅文, 姚欣洪, 宫云战 ,等. 一种基于代码静态分析的缓冲区溢出检测算法[J]. 计算机研究与发展, 2012,49(4): 839-845.
	WANG Y W , YAO X H , GONG Y Z ,et al. A method of buffer overflow detection based on static code analysis[J]. Journal of Computer Research and Development, 2012,49(4): 839-845.
[21]	王蕾, 李丰, 李炼 ,等. 污点分析技术的原理和实践应用[J]. 软件学报, 2017,28(4): 860-882.
	WANG L , LI F , LI L ,et al. Principle and practice of taint analysis[J]. Journal of Software, 2017,28(4): 860-882.
[22]	RAHMA M , QUSAY H M . Evaluation of static analysis tools for finding vulnerabilities in Java and C/C++ source code[J].CoRR abs/1805.09040,2018. CoRR abs/1805.09040, 2018.
[23]	HOSSAIN S , MOHAMMAD Z . Mitigating program security vulnerabilities:approaches and challenges[J]. ACM Computer Survey, 2012,44(3): 1-46.
[24]	VIEGA J , BLOCH J T , KOHNO Y ,et al. ITS4:a static vulnerability scanner for C and C++ code[C]// The 16th Annual Computer Security Applications Conference,New Orleans,Louisiana. 2000: 257-267.
[25]	ANIQUA Z B , TAMARA D . IDE plugins for detecting input-validation vulnerabilitie[C]// IEEE Symposium on Security and Privacy Workshops. 2017: 143-146.
[26]	JAMES W , MAUREEN D . SAVI:static-analysis vulnerability indicator[J]. IEEE Security ＆ Privacy, 2012,10(3): 32-39.
[27]	BILL B , . How to find and fix software vulnerabilities with Coverity static analysis[C]// IEEE Cybersecurity Development (SecDev). 2016:153.
[28]	YAMAGUCHI F . Pattern-based vulnerability discovery[D]. Dissertation:University of Gottingen, 2015.
[29]	YAMAGUCHI F , WRESSNEGGER C , GASCON H . et al. Chucky:exposing missing checks in source code for vulnerability discovery[C]// The 2013 ACM SIGSAC Conference on Computer and Communications Security. 2013: 499-510.
[30]	YAMAGUCHI F , MAIER A , GASCON H . et al. Automatic inference of search patterns for taint-style vulnerabilities[C]// The 2015 IEEE Symposium on Security and Privacy (S＆P). 2015: 797-812.
[31]	SHANKAR U , TALWAR K , FOSTER J S ,et al. Detecting format string vulnerabilities with type qualifiers[C]// The 10th USENIX Security Symposium. 2001: 201-220.
[32]	BACKES M , KOPF B , RYBALCHENKO A . Automatic discovery and quantification of information leaks[C]// The 30th IEEE Symposium on Security and Privacy (S＆P). 2009: 141-153.
[33]	GRIECO G , GRINBLAT G L , UZAL L ,et al. Toward large-scale vulnerability discovery using machine learning[C]// The 6th ACM Conference on Data and Application Security and Privacy. 2009: 85-96.
[34]	NEUHAUS S , ZIMMERMANN T , HOLLER C ,et al. Predicting vulnerable software components[C]// The 14th ACM Conference on Computer and Communications Security (CCS). 2007: 529-540.
[35]	SHIN Y , MENEELY A , WILLIAMS L ,et al. Evaluating complexity,code churn,and developer activity metrics as indicators of software vulnerabilities[J]. IEEE Transactions on Software Engineering, 2011,37(6): 772-787.
[36]	MOSHTARI S , SAMI A . Evaluating and comparing complexity,coupling and a new proposed set of coupling metrics in cross-project vulnerability prediction[C]// The 31st Annual ACM Symposium on Applied Computing. 2016: 1415-1421.
[37]	SCANDARIATO R , WALDEN J , HOVSEPYAN A ,et al. Predicting vulnerable software components via text mining[J]. IEEE Transactions on Software Engineering, 2014,401(10): 993-1006.
[38]	LIN G , ZHANG J , LUO W ,et al. POSTER:vulnerability discovery with function representation learning from unlabeled projects[C]// The 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS). 2017: 2539-2541.
[39]	XU X , LIU C , FENG Q ,et al. Neural network-based graph embedding for cross-platform binary code similarity detection[C]// The 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS). 2017: 363-376.
[40]	RAJPAL M , BLUM W , SINGH R . Not all bytes are equal:neural byte sieve for fuzzing[J]. CoRR,vol.abs/1711.04596, 2017.
[41]	RUSSELl R L , KIM L Y , HAMILTON L H ,et al. Automated vulnerability detection in source code using deep representation learning[J]. CoRR,vol.abs/1807.04320, 2018.
[42]	HARER J A , KIM L Y , RUSSELL R L ,et al. Automated software vulnerability detection with machine learning[J]. CoRR,vol.abs/1803.04497, 2018.
[43]	YANG X , LO D , XIA X ,et al. Deep learning for just-in-time defect prediction[C]// The 2015 IEEE International Conference on Software Quality,Reliability and Security (QRS). 2015: 17-26.
[44]	WANG S , LIU T , TAN L . Automatically learning semantic features for defect prediction[C]// The 38th International Conference on Software Engineering (ICSE). 2016: 297-308.
[45]	PHAN A V , NGUYEN M L , BUI L T . Convolutional neural networks over control flow graphs for software defect prediction[C]// 29th IEEE International Conference on Tools with Artificial Intelligence (ICTAI). 2017: 45-52.
[46]	LI J , HE P , ZHU J.et al . Software defect prediction via convolutional neural network[C]// The 2017 IEEE International Conference on Software Quality,Reliability and Security (QRS). 2017: 318-328.
[47]	DAM H K , PHAM T , NG S W ,et al. A deep tree-based model for software defect prediction[J]. CoRR abs/1802.00921, 2018.
[48]	HUO X , LI M , ZHOU Z H . Learning unified features from natural and programming languages for locating buggy source code[C]// The 25th International Joint Conference on Artificial Intelligence (IJCAI). 2016: 1606-1612.
[49]	XIAO Y , KEUNG J , MI Q ,et al. Bug localization with semantic and structural features using convolutional neural network and cascade forest[C]// The 22nd International Conference on Evaluation and Assessment in Software Engineering (EASE). 2018: 101-111.
[50]	DU M , LI F , ZHENG G ,et al. DeepLog:anomaly detection and diagnosis from system logs through deep learning[C]// The 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS). 2017: 1285-1298.
[51]	WHITE M , VENDOME C , LINARES-VáSQUEZ M , et al. Toward deep learning software repositories[C]// The 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories (MSR). 2015: 334-345.
[52]	WHITE M , TUFANO M , VENDOME C ,et al. Deep learning code fragments for code clone detection[C]// The 31st IEEE/ACM International Conference on Automated Software Engineering (ASE). 2016: 87-98.
[53]	GU X , ZHANG H , ZHANG D ,et al. Deep API learning[C]// The 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE). 2016: 631-642.
[54]	SHIN E C R , SONG D , MOAZZEZI R . Recognizing functions in binaries with neural networks[C]// The 24th USENIX Security Symposium. 2015: 611-626.
[55]	SAXE J , BERLIN K . eXpose:a character-level convolutional neural network with embeddings for detecting malicious URLs,file paths and registry keys[J]. CoRR,vol.abs/1702.08568, 2017.
[56]	RAHUL G , SOHAM P , ADITYA K ,et al. DeepFix:fixing common C language errors by deep learning[C]// The 31st AAAI Conference on Artificial Intelligence (AAAI). 2017: 1345-1351.
[57]	GUO J , CHENG J H , HUANG J C . Semantically enhanced software traceability using deep learning techniques[C]// The 39th IEEE/ACM International Conference on Software Engineering (ICSE). 2017: 3-14.
[58]	ALON U , ZILBERSTEIN U , LEVY O ,et al. A general path-based representation for predicting program properties[C]// The 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). 2018: 404-419.
[59]	ALSULAMI B , DAUBER E , RICHARD E . Source code authorship attribution using long short-term memory based networks[C]// The 22nd European Symposium on Research in Computer Security (ESORICS). 2017: 65-82.
[60]	LI Z , ZOU D Q , XU S H ,et al. VulPecker:an automated vulnerability detection system based on code similarity analysis[C]// The 32nd Annual Computer Security Applications Conference (ACSAC). 2016: 201-213.
[61]	FALLERI J R , MORANDAT F , BLANC X ,et al. Fine-grained and accurate source code differencing[C]// The 29th ACM/IEEE International Conference on Automated Software Engineering (ASE). 2014: 313-324.
[1]	CHOWDHURY I , ZULKERNINE M . Using complexity,coupling,and cohesion metrics as early indicators of vulnerabilities[J]. Journal of Systems Architecture, 2011,57(13): 244-313.
[62]	LI Z , ZOU D Q , XU S H ,et al. VulDeePecker:a deep learning-based system for vulnerability detection[C]// The 25th Annual Network and Distributed System Security Symposium (NDSS). 2018.

类型	描述	类型	描述
1	基本特征	3-9	删除函数声明
1-1	CVE ID	3-10	修改运算符
1-2	CWE ID	4	表达式特征
1-3	产品发行商	4-1	修改赋值表达式
1-4	影响的产品	4-2	修改if条件
1-5	漏洞严重度	4-3	修改for条件
2	非本质性特征	4-4	修改while条件
2-1	空格、格式或注释的修改	4-5	修改do while条件
3	元素特征	4-6	修改switch条件
3-1	修改变量名	5	语句特征
3-2	修改常量	5-1	增加行
3-3	修改变量类型	5-2	删除行
3-4	修改函数名	5-3	移动行
3-5	增加函数参数	6	函数特征
3-6	删除函数参数	6-1	增加整个函数
3-7	修改函数参数	6-2	删除整个函数
3-8	增加函数声明	6-3	函数外的修改

面向源代码的软件漏洞静态检测综述

Survey on static software vulnerability detection for source code

在线阅读

pdf下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 4

参考文献 62

相关文章 15

Metrics

推荐阅读 0

[1]	李晓萌, 郭玳豆, 卓训方, 姚恒, 秦川. 载体独立的抗屏摄信息膜叠加水印算法[J]. 网络与信息安全学报, 2023, 9(3): 135-149.
[2]	谢绒娜, 马铸鸿, 李宗俞, 田野. 基于卷积神经网络的加密流量分类方法[J]. 网络与信息安全学报, 2022, 8(6): 84-91.
[3]	章登勇, 文凰, 李峰, 曹鹏, 向凌云, 杨高波, 丁湘陵. 基于双分支网络的图像修复取证方法[J]. 网络与信息安全学报, 2022, 8(6): 110-122.
[4]	林佳滢, 周文柏, 张卫明, 俞能海. 空域频域相结合的唇型篡改检测方法[J]. 网络与信息安全学报, 2022, 8(6): 146-155.
[5]	陈晋音, 吴长安, 郑海斌. 基于softmax激活变换的对抗防御方法[J]. 网络与信息安全学报, 2022, 8(2): 48-63.
[6]	邱宝琳, 易平. 基于多维特征图知识蒸馏的对抗样本防御方法[J]. 网络与信息安全学报, 2022, 8(2): 88-99.
[7]	李丽娟, 李曼, 毕红军, 周华春. 基于混合深度学习的多类型低速率DDoS攻击检测方法[J]. 网络与信息安全学报, 2022, 8(1): 73-85.
[8]	秦中元, 贺兆祥, 李涛, 陈立全. 基于图像重构的MNIST对抗样本防御算法[J]. 网络与信息安全学报, 2022, 8(1): 86-94.
[9]	邹德清, 李响, 黄敏桓, 宋翔, 李浩, 李伟明. 基于图结构源代码切片的智能化漏洞检测系统[J]. 网络与信息安全学报, 2021, 7(5): 113-122.
[10]	王正龙, 张保稳. 生成对抗网络研究综述[J]. 网络与信息安全学报, 2021, 7(4): 68-85.
[11]	李炳龙, 佟金龙, 张宇, 孙怡峰, 王清贤, 常朝稳. 基于TensorFlow的恶意代码片段自动取证检测算法[J]. 网络与信息安全学报, 2021, 7(4): 154-163.
[12]	谭清尹, 曾颖明, 韩叶, 刘一静, 刘哲理. 神经网络后门攻击研究[J]. 网络与信息安全学报, 2021, 7(3): 46-58.
[13]	杨路辉,白惠文,刘光杰,戴跃伟. 基于可分离卷积的轻量级恶意域名检测模型[J]. 网络与信息安全学报, 2020, 6(6): 112-120.
[14]	刘西蒙,谢乐辉,王耀鹏,李旭如. 深度学习中的对抗攻击与防御[J]. 网络与信息安全学报, 2020, 6(5): 36-53.
[15]	杜思佳,于海宁,张宏莉. 基于深度学习的文本分类研究进展[J]. 网络与信息安全学报, 2020, 6(4): 1-13.