通信学报 ›› 2021, Vol. 42 ›› Issue (11): 233-241.doi: 10.11959/j.issn.1000-436x.2021196

• 学术通信 • 上一篇    下一篇

双粒度轻量级漏洞代码切片方法评估模型

张炳1,2, 文峥1,2, 赵宇轩1, 王苧1, 任家东1,2   

  1. 1 燕山大学信息科学与工程学院,河北 秦皇岛 066004
    2 河北省软件工程重点实验室,河北 秦皇岛 066004
  • 修回日期:2021-09-23 出版日期:2021-11-25 发布日期:2021-11-01
  • 作者简介:张炳(1989− ),男,湖北黄冈人,博士,燕山大学副教授、硕士生导师,主要研究方向为数据挖掘、机器学习、软件安全
    文峥(1998− ),男,河北保定人,燕山大学硕士生,主要研究方向为软件安全
    赵宇轩(1997− ),男,河北秦皇岛人,燕山大学硕士生,主要研究方向为文本挖掘、软件安全
    王苧(1994− ),女,山西阳泉人,燕山大学硕士生,主要研究方向为软件安全
    任家东(1967− ),男,黑龙江齐齐哈尔人,博士,燕山大学教授、博士生导师,主要研究方向为时态数据建模、软件安全
  • 基金资助:
    国家自然科学基金资助项目(61802332);国家自然科学基金资助项目(61807028);国家自然科学基金资助项目(61772449);燕山大学博士基金资助项目(BL18012)

Dual-granularity lightweight model for vulnerability code slicing method assessment

Bing ZHANG1,2, Zheng WEN1,2, Yuxuan ZHAO1, Ning WANG1, Jiadong REN1,2   

  1. 1 School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China
    2 Key Laboratory of Software Engineering of Hebei Province, Qinhuangdao 066004, China
  • Revised:2021-09-23 Online:2021-11-25 Published:2021-11-01
  • Supported by:
    The National Natural Science Foundation of China(61802332);The National Natural Science Foundation of China(61807028);The National Natural Science Foundation of China(61772449);The Doctoral Foundation Program of Yanshan University(BL18012)

摘要:

针对现有漏洞代码切片方法评估过程存在的切片信息抽取不完全、模型复杂度高且泛化能力差、评估过程开环无反馈的问题,提出了一种双粒度轻量级漏洞代码切片方法评估模型(VCSE)。针对代码片段,构建了轻量级的TF-IDF与N-gram融合模型,高效绕过了OOV问题,并基于词、字符双粒度提取了代码切片语义及统计特征,设计了高精确率与泛化性能的异质集成分类器,进行漏洞预测分析。实验结果表明,轻量级VCSE的评估效果明显优于当前应用广泛的深度学习模型。

关键词: 代码切片, 漏洞检测, 未登录词, 轻量级, 评估方法

Abstract:

Aiming at the problems existing in the assessment of existing vulnerability code slicing method, such as incomplete extraction of slicing information, high model complexity and poor generalization ability, and no feedback in the evaluation process, a dual-granularity lightweight vulnerability code slicing evaluation (VCSE) model was proposed.Aiming at the code snippet, a lightweight fusion model of TF-IDF and N-gram was constructed, which bypassed the OOV problem efficiently, and the semantic and statistical features of code slices were extracted based on the double granularity of words and characters.A heterogeneous integrated classifier with high accuracy and generalization performance was designed for vulnerability prediction and analysis.The experimental results show that the evaluation effect of lightweight VCSE is obviously better than that of the current widely used deep learning model.

Key words: code slicing, vulnerability prediction, out of vocabulary, lightweight, assessment method

中图分类号: 

No Suggested Reading articles found!