网络与信息安全学报 ›› 2022, Vol. 8 ›› Issue (1): 52-62.doi: 10.11959/j.issn.2096-109x.2021094

• 专栏:安全感知与检测方法 • 上一篇    下一篇

面向项目版本差异性的漏洞识别技术研究

黄诚1,2, 孙明旭1, 段仁语1, 吴苏晟1, 陈斌1   

  1. 1 四川大学网络空间安全学院,四川 成都 610065
    2 广西密码学与信息安全重点实验室,广西 桂林541000
  • 修回日期:2021-10-12 出版日期:2022-02-15 发布日期:2022-02-01
  • 作者简介:黄诚(1987− ),男,重庆人,四川大学副教授,主要研究方向为网络空间安全、攻击检测、威胁溯源、数据挖掘、社交网络、机器学习和自然语言处理
    孙明旭(2000− ),男,黑龙江绥化人,主要研究方向为数据挖掘、自然语言处理和漏洞情报分析
    段仁语(1998− ),男,重庆人,主要研究方向为漏洞情报挖掘、计算机视觉和人工智能
    吴苏晟(1999− ),男,浙江杭州人,主要研究方向为漏洞挖掘和开源代码漏洞库的分析与构建
    陈斌(1999− ),男,江西南昌人,主要研究方向为漏洞挖掘与自然语言处理
  • 基金资助:
    国家自然科学基金(61902265);四川省科技厅重点研发项目(2020YFG0047);广西密码学与信息安全重点实验室研究课题(GCIS201921)

Vulnerability identification technology research based on project version difference

Cheng HUANG1,2, Mingxu SUN1, Renyu DUAN1, Susheng WU1, Bin CHEN1   

  1. 1 School of Cyber Science and Engineering, Sichuan University, Chengdu 610065, China
    2 Guangxi Key Laboratory of Cryptography and Information Security, Guilin 541000, China
  • Revised:2021-10-12 Online:2022-02-15 Published:2022-02-01
  • Supported by:
    The National Natural Science Foundation of China(61902265);Sichuan Science and Technology Program(2020YFG0047);Guangxi Key Laboratory of Cryptography and Information Security(GCIS201921)

摘要:

开源代码托管平台为软件开发行业带来了活力和机遇,但存在诸多安全隐患。开源代码的不规范性、项目依赖库的复杂性、漏洞披露平台收集漏洞的被动性等问题都影响着开源项目及引入开源组件的闭源项目的安全,大部分漏洞修复行为无法及时被察觉和识别,进而将各类项目的安全风险直接暴露给攻击者。为了全面且及时地发现开源项目中的漏洞修复行为,设计并实现了基于项目版本差异性的漏洞识别系统—VpatchFinder。系统自动获取开源项目中的更新代码及内容数据,对更新前后代码和文本描述信息进行提取分析。提出了基于安全行为与代码特征的差异性特征,提取了包括项目注释信息特征组、页面统计特征组、代码统计特征组以及漏洞类型特征组的共 40 个特征构建特征集,采用随机森林算法来训练可识别漏洞的分类器。通过真实漏洞数据进行测试,VpatchFinder 的精确率为 84.35%,准确率为 85.46%,召回率为85.09%,优于其他常见的机器学习算法模型。进一步通过整理的历年部分开源软件 CVE 漏洞数据进行实验,其结果表明 68.07%的软件漏洞能够提前被 VpatchFinder 发现。该研究结果可以为软件安全架构设计、开发及成分分析等领域提供有效技术支撑。

关键词: 漏洞识别, 开源平台, 安全修复, 机器学习

Abstract:

The open source code hosting platform has brought power and opportunities to software development, but there are also many security risks.The open source code has poor quality, the dependency libraries of projects are complex and vulnerability collection platforms are inadequate in collecting vulnerabilities.All these problems affect the security of open source projects and complex software with open source complements and most security patches can't be discovered and applied in time.Thus, the hackers could be easily found such vulnerable software.To discover the vulnerability in the open source community fully and timely, a vulnerability identification system based on project version difference was proposed.The update contents of projects in the open source community were collected automatically, then features were defined as security behaviors and code differences from the code and log in patches, 40 features including comment information feature group, page statistics feature group, code statistics feature group and vulnerability type feature group were proposed to build feature set.And random forest model was built to learn classifiers for vulnerability identification.The results show that VpatchFinder achieves a precision rate of 0.844, an accuracy rate of 0.855 and a recall rate of 0.851.Besides, 68.07% of community vulnerabilities can be early discovered by VpatchFinder in real open source CVE vulnerabilities.This research result can improve the current issue in software security architecture design and development.

Key words: vulnerability detection, open source platform, security patch, machine learning

中图分类号: 

No Suggested Reading articles found!