网络与信息安全学报 ›› 2022, Vol. 8 ›› Issue (1): 52-62.doi: 10.11959/j.issn.2096-109x.2021094
黄诚1,2, 孙明旭1, 段仁语1, 吴苏晟1, 陈斌1
修回日期:
2021-10-12
出版日期:
2022-02-15
发布日期:
2022-02-01
作者简介:
黄诚(1987− ),男,重庆人,四川大学副教授,主要研究方向为网络空间安全、攻击检测、威胁溯源、数据挖掘、社交网络、机器学习和自然语言处理基金资助:
Cheng HUANG1,2, Mingxu SUN1, Renyu DUAN1, Susheng WU1, Bin CHEN1
Revised:
2021-10-12
Online:
2022-02-15
Published:
2022-02-01
Supported by:
摘要:
开源代码托管平台为软件开发行业带来了活力和机遇,但存在诸多安全隐患。开源代码的不规范性、项目依赖库的复杂性、漏洞披露平台收集漏洞的被动性等问题都影响着开源项目及引入开源组件的闭源项目的安全,大部分漏洞修复行为无法及时被察觉和识别,进而将各类项目的安全风险直接暴露给攻击者。为了全面且及时地发现开源项目中的漏洞修复行为,设计并实现了基于项目版本差异性的漏洞识别系统—VpatchFinder。系统自动获取开源项目中的更新代码及内容数据,对更新前后代码和文本描述信息进行提取分析。提出了基于安全行为与代码特征的差异性特征,提取了包括项目注释信息特征组、页面统计特征组、代码统计特征组以及漏洞类型特征组的共 40 个特征构建特征集,采用随机森林算法来训练可识别漏洞的分类器。通过真实漏洞数据进行测试,VpatchFinder 的精确率为 84.35%,准确率为 85.46%,召回率为85.09%,优于其他常见的机器学习算法模型。进一步通过整理的历年部分开源软件 CVE 漏洞数据进行实验,其结果表明 68.07%的软件漏洞能够提前被 VpatchFinder 发现。该研究结果可以为软件安全架构设计、开发及成分分析等领域提供有效技术支撑。
中图分类号:
黄诚, 孙明旭, 段仁语, 吴苏晟, 陈斌. 面向项目版本差异性的漏洞识别技术研究[J]. 网络与信息安全学报, 2022, 8(1): 52-62.
Cheng HUANG, Mingxu SUN, Renyu DUAN, Susheng WU, Bin CHEN. Vulnerability identification technology research based on project version difference[J]. Chinese Journal of Network and Information Security, 2022, 8(1): 52-62.
表1
选取的所有特征Table 1 Summary of all the extracted features"
特征组类别 | 特征名称 | 符号 | 特征序号 | 来源 |
注释信息特征组α | Subject安全关键词统计 | αswc | 1 | 首次提出 |
Subject非安全关键词统计 | αnswc | 2 | 首次提出 | |
变化的文件数量 | βcfn | 3 | 文献[ | |
变化的修改块数量 | βccn | 4 | 文献[ | |
变化的行的数量 | βcln | 5~10 | 文献[ | |
页面统计特征组β | 变化的字符数量 | βccn | 11~16 | 文献[ |
添加代码与移除代码的相似程度 | βars | 17~19 | 文献[ | |
出现相同的代码更改的最大次数 | βmsn | 20 | 首次提出 | |
patch文件大小 | βfis | 21 | 首次提出 | |
变化的条件语句的数量 | γccn | 22~27 | 文献[ | |
代码统计特征组γ | 变化的循环语句的数量 | γcln | 28~33 | 文献[ |
变化的算术、逻辑和关系运算符的总数量 | γcon | 34~39 | 首次提出 | |
漏洞类型特征组δ | 变化的代码中漏洞关键函数统计 | δcwc | 40 | 首次提出 |
表5
已识别的秘密漏洞实例Table 5 Instances of Identified secret vulnerability"
漏洞来源 | 提交时间 | 漏洞类型 | 漏洞描述 |
tcpdump(5e48…fb8) | Mon,13 Jul 2015 | 栈溢出 | print-juniper.c 文件中的函数 juniper_parseHeader()存在缓冲区溢出漏洞,攻击者可利用此处实施拒绝服务攻击,导致应用程序崩溃 |
radare2(9bd0…7fb) | Fri,15 Nov 2013 | 栈溢出 | cmd_write.c文件中未判断变量的数据长度,可能引发栈溢出漏洞 |
micropython(2daa…93b) | Fri,1 Sep 2017 | 栈溢出 | 未进行输入数据格式的规范性检验,攻击者可以人为构造可利用代码溢出modstruct.c文件中的结构体缓冲区,实施恶意攻击 |
ImageMagick(a464…0d0) | Mon,11 Feb 2019 | 堆溢出 | 处理SVG图像时,未检查变量message的数据长度,导致堆溢出漏洞 |
php-src (6ebe…d3c) | Thu,27 May 2010 | 空指针 | 在mysqlnd.c文件中,调用指针变量result前缺乏判空步骤,远程攻击者可以绕过访问限制或进行拒绝服务攻击,导致应用程序崩溃 |
php-src(cdd9…004) | Wed,22 Aug 2018 | 内存泄露 | 在array.c文件中,未释放result变量,如果回调函数异常,可能会产生内存泄露,导致系统崩溃或敏感信息泄露 |
php-src(0a1c…b68) | Sat,15 Oct 2011 | 整数溢出 | ext/soap/php_http.c 文件中的 emalloc 函数存在整数溢出漏洞,有权限执行该脚本的攻击者可以利用此处进行提权操作 |
[1] | ALFADEL M , COSTA D E , SHIHAB E ,et al. On the use of dependabot security pull requests[C]// Proceedings of 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). 2021: 254-265. |
[2] | PASHCHENKO I , PLATE H , PONTA S E ,et al. Vulnerable open source dependencies:counting those that matter[C]// Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. 2018: 1-10. |
[3] | SABETTA A , BEZZI M . A practical approach to the automatic classification of security-relevant commits[C]// Proceedings of 2018 IEEE International Conference on Software Maintenance and Evolution. 2018: 579-582. |
[4] | KAMIYA T , KUSUMOTO S , INOUE K . CCFinder:a multilinguistic token-based code clone detection system for large scale source code[J]. IEEE Transactions on Software Engineering, 2002,28(7): 654-670. |
[5] | LI Z , LU S , MYAGMAR S ,et al. CP-Miner:finding copy-paste and related bugs in large-scale software code[J]. IEEE Transactions on Software Engineering, 2006,32(3): 176-192. |
[6] | 王雅文, 姚欣洪, 宫云战 ,等. 一种基于代码静态分析的缓冲区溢出检测算法[J]. 计算机研究与发展, 2012,49(4): 839-845. |
WANG Y W , YAO X H , GONG Y Z ,et al. A method of buffer overflow detection based on static code analysis[J]. Journal of Computer Research and Development, 2012,49(4): 839-845. | |
[7] | 王蕾, 李丰, 李炼 ,等. 污点分析技术的原理和实践应用[J]. 软件学报, 2017,28(4): 860-882. |
WANG L , LI F , LI L ,et al. Principle and practice of taint analysis[J]. Journal of Software, 2017,28(4): 860-882. | |
[8] | YAMAGUCHI F , LOTTMANN M , RIECK K . Generalized vulnerability extrapolation using abstract syntax trees[C]// Proceedings of the 28th Annual Computer Security Applications Conference. 2012: 359-368. |
[9] | LI J Y , ERNST M D . CBCD:cloned buggy code detector[C]// Proceedings of 2012 34th International Conference on Software Engineering (ICSE). 2012: 310-320. |
[10] | LI Z , ZOU D Q , XU S H ,et al. VulDeePecker:a deep learning-based system for vulnerability detection[C]// Proceedings 2018 Network and Distributed System Security Symposium. 2018. |
[11] | TIAN Y , LAWALL J , LO D . Identifying Linux bug fixing patches[C]// Proceedings of 2012 34th International Conference on Software Engineering (ICSE). 2012: 386-396. |
[12] | PERL H , DECHAND S , SMITH M ,et al. VCCFinder:finding potential vulnerabilities in open-source projects to assist code audits[C]// Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. 2015: 426-437. |
[13] | ZAMAN S , ADAMS B , HASSAN A E . Security versus performance bugs:a case study on Firefox[C]// Proceedings of the 8th Working Conference on Mining Software Repositories. 2011: 93-102. |
[14] | LI F , PAXSON V . A large-scale empirical study of security patches[C]// Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 2017: 2201-2215. |
[15] | WANG X D , SUN K , BATCHELLER A ,et al. An empirical study of secret security patch in open source software[M]// Adaptive Autonomous Secure Cyber Systems. 2020: 269-289. |
[16] | NEUHAUS S , ZIMMERMANN T , HOLLER C ,et al. Predicting vulnerable software components[C]// Proceedings of the 14th ACM Conference on Computer and Communications Security. 2007: 529-540. |
[17] | 郑荣锋, 方勇, 刘亮 . 基于动态行为指纹的恶意代码同源性分析[J]. 四川大学学报(自然科学版), 2016,53(4): 793-798. |
ZHENG R F , FANG Y , LIU L . Homology analysis of malicious code based on dynamic-behavior fingerprint[J]. Journal of Sichuan University (Natural Science Edition), 2016,53(4): 793-798. | |
[18] | KONG D G , ZHENG Q , CHEN C ,et al. ISA:a source code static vulnerability detection system based on data fusion[C]// Proceedings of the 2nd International ICST Conference on Scalable Information Systems. 2007:55. |
[19] | SONNEKALB T , . Machine-learning supported vulnerability detection in source code[C]// Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2019: 1180-1183. |
[20] | 李元诚, 崔亚奇, 吕俊峰 ,等. 开源软件漏洞检测的混合深度学习方法[J]. 计算机工程与应用, 2019,55(11): 52-59. |
LI Y C , CUI Y Q , LYU J F ,et al. Combined deep learning method for open source software vulnerability detection[J]. Computer Engineering and Applications, 2019,55(11): 52-59. | |
[21] | JIANG L X , MISHERGHI G , SU Z D ,et al. DECKARD:scalable and accurate tree-based detection of code clones[C]// Proceedings of 29th International Conference on Software Engineering (ICSE'07). 2007: 96-105. |
[22] | ALON U , ZILBERSTEIN M , LEVY O ,et al. code2vec:learning distributed representations of code[J]. Proceedings of the ACM on Programming Languages, 2019,3:40. |
[23] | LIU C , CHEN C , HAN J W ,et al. GPLAG:detection of software plagiarism by program dependence graph analysis[C]// Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2006: 872-881. |
[24] | PHAM N H , NGUYEN T T , NGUYEN H A ,et al. Detection of recurring software vulnerabilities[C]// ASE '10:Proceedings of the IEEE/ACM International Conference on Automated Software Engineering. 2010: 447-456. |
[25] | 刘凯, 方勇, 张磊 ,等. 基于图卷积网络的恶意代码聚类[J]. 四川大学学报(自然科学版), 2019,56(4): 654-660. |
LIU K , FANG Y , ZHANG L ,et al. Malware clustering based on graph convolutional networks[J]. Journal of Sichuan University (Natural Science Edition), 2019,56(4): 654-660. | |
[26] | SHIN Y , MENEELY A , WILLIAMS L ,et al. Evaluating complexity,code churn,and developer activity metrics as indicators of software vulnerabilities[J]. IEEE Transactions on Software Engineering, 2011,37(6): 772-787. |
[27] | NEIL L , MITTAL S , JOSHI A . Mining threat intelligence about open-source projects and libraries from code repository issues and bug reports[C]// Proceedings of 2018 IEEE International Conference on Intelligence and Security Informatics. 2018: 7-12. |
[28] | 曹琰, 刘龙, 王禹 ,等. 基于函数语义分析的软件补丁比对技术[J]. 网络与信息安全学报, 2019,5(5): 56-63. |
CAO Y , LIU L , WANG Y ,et al. Software patch comparison technology through semantic analysis on function[J]. Chinese Journal of Network and Information Security, 2019,5(5): 56-63. | |
[29] | PONTA S E , PLATE H , SABETTA A ,et al. A manually-curated dataset of fixes to vulnerabilities of open-source software[C]// Proceedings of 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). 2019: 383-387. |
[30] | RAMOS J . Using TF-IDF to determine word relevance in document queries[J]. Proceedings of the First Instructional Conference on Machine Learning, 2003: 29-48. |
[31] | RISTAD E S , YIANILOS P N . Learning string-edit distance[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998,20(5): 522-532. |
[32] | 吕维梅, 刘坚 . C/C++程序安全漏洞的分类与分析[J]. 计算机工程与应用, 2005,41(5): 123-125,228. |
LYU W M , LIU J . The classification and analysis on safety holes of C/C++ programs[J]. Computer Engineering and Applications, 2005,41(5): 123-125,228. | |
[33] | BREIMAN L . Random forests[J]. Machine Learning, 2001,45(1): 5-32. |
[1] | 夏锐琪, 李曼曼, 陈少真. 基于机器学习的分组密码结构识别[J]. 网络与信息安全学报, 2023, 9(3): 79-89. |
[2] | 韦南, 殷丽华, 宁洪, 方滨兴. 本科“机器学习”课程教学改革初探[J]. 网络与信息安全学报, 2022, 8(4): 182-189. |
[3] | 顾笛儿, 卢华, 谢人超, 黄韬. 边缘计算开源平台综述[J]. 网络与信息安全学报, 2021, 7(2): 22-34. |
[4] | 张颖君,刘尚奇,杨牧,张海霞,黄克振. 基于日志的异常检测技术综述[J]. 网络与信息安全学报, 2020, 6(6): 1-12. |
[5] | 付溪,李晖,赵兴文. 网络钓鱼识别研究综述[J]. 网络与信息安全学报, 2020, 6(5): 1-10. |
[6] | 何康,祝跃飞,刘龙,芦斌,刘彬. 敌对攻击环境下基于移动目标防御的算法稳健性增强方法[J]. 网络与信息安全学报, 2020, 6(4): 67-76. |
[7] | 袁福祥,刘粉林,刘翀,刘琰,罗向阳. MLAR:面向IP定位的大规模网络别名解析[J]. 网络与信息安全学报, 2020, 6(4): 77-94. |
[8] | 骆子铭,许书彬,刘晓东. 基于机器学习的TLS恶意加密流量检测方案[J]. 网络与信息安全学报, 2020, 6(1): 77-83. |
[9] | 黄伟,刘存才,祁思博. 针对设备端口链路的LSTM网络流量预测与链路拥塞方案[J]. 网络与信息安全学报, 2019, 5(6): 50-57. |
[10] | 宋蕾, 马春光, 段广晗. 机器学习安全及隐私保护研究进展[J]. 网络与信息安全学报, 2018, 4(8): 1-11. |
[11] | 明拓思宇, 陈鸿昶. 文本摘要研究进展与趋势[J]. 网络与信息安全学报, 2018, 4(6): 1-10. |
[12] | 王正琦,冯晓兵,张驰. 基于两层分类器的恶意网页快速检测系统研究[J]. 网络与信息安全学报, 2017, 3(8): 44-60. |
[13] | 张茜,延志伟,李洪涛,耿光刚. 网络钓鱼欺诈检测技术研究[J]. 网络与信息安全学报, 2017, 3(7): 7-24. |
[14] | 张东,张尧,刘刚,宋桂香. 基于机器学习算法的主机恶意代码检测技术研究[J]. 网络与信息安全学报, 2017, 3(7): 25-32. |
[15] | 孙博文,黄炎裔,温俏琨,田斌,吴鹏,李祺. 基于静态多特征融合的恶意软件分类方法[J]. 网络与信息安全学报, 2017, 3(11): 68-76. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
|