网络与信息安全学报 ›› 2023, Vol. 9 ›› Issue (2): 132-142.doi: 10.11959/j.issn.2096-109x.2023027

• 学术论文 • 上一篇    下一篇

基于深度强化学习的高性能导向性模糊测试方案

肖天1, 江智昊1,2, 唐鹏1, 黄征1, 郭捷1, 邱卫东1   

  1. 1 上海交通大学网络空间安全学院,上海 200240
    2 哥伦比亚大学,美国 纽约 10027
  • 修回日期:2023-03-02 出版日期:2023-04-25 发布日期:2023-04-01
  • 作者简介:肖天(1998- ),男,江苏常州人,上海交通大学硕士生,主要研究方向为漏洞挖掘、隐私保护
    江智昊(1998- ),男,上海人,主要研究方向为文件系统、漏洞挖掘、分布式系统
    唐鹏(1992- ),男,江西抚州人,上海交通大学博士生,主要研究方向为人工智能安全、隐私保护
    黄征(1975- ),男,四川南充人,博士,上海交通大学副教授,主要研究方向为隐私保护、计算机视觉、人工智能安全
    郭捷(1976- ),女,河南信阳人,博士,上海交通大学副研究员,主要研究方向为多媒体安全、模式识别、大数据分析
    邱卫东(1973- ),男,江西九江人,博士,上海交通大学教授、博士生导师,主要研究方向为密码分析/密码工程、人工智能安全、大数据隐私保护
  • 基金资助:
    国家自然科学基金(61972249)

High-performance directional fuzzing scheme based on deep reinforcement learning

Tian XIAO1, Zhihao JIANG1,2, Peng TANG1, Zheng HUANG1, Jie GUO1, Weidong QIU1   

  1. 1 School of Cyber Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
    2 Columbia University, New York 10027, America
  • Revised:2023-03-02 Online:2023-04-25 Published:2023-04-01
  • Supported by:
    The National Natural Science Foundation of China(61972249)

摘要:

随着移动互联网与信息技术的快速发展,越来越多的应用程序融入人们的生活,但这些应用程序中存在的漏洞严重威胁着用户隐私和信息安全。近年来,模糊测试作为流行的漏洞挖掘技术之一,因其漏洞易复现且误报率低的特点而被广泛地使用。它能随机生成测试用例并执行程序,通过覆盖率或样本生成方面的优化以检测更深的程序路径。但是模糊测试中的变异操作存在一定的盲目性,易使生成的测试样本执行相同程序路径。因此传统模糊测试普遍存在挖掘效率低、输入构造的随机性强、算法对程序结构针对性有限等问题。针对上述问题,提出了基于深度强化学习的高性能导向性模糊测试方案,通过程序插桩等方法获取程序运行时的信息,使用深度强化学习网络指导模糊测试选择测试样本,生成有针对性和导向性的测试样本以快速逼近并检验可能存在漏洞的程序路径,从而提高模糊测试的效率。实验表明,在LAVA-M测试集与真实应用程序LibPNG和Binutils上,所提方案比流行模糊测试工具AFL与AFLGO在漏洞检测与复现等方面有着更好的表现,因此该方案可为今后的漏洞挖掘和安全研究提供支撑。

关键词: 漏洞挖掘, 模糊测试, 深度强化学习, 程序路径

Abstract:

With the continuous growth and advancement of the Internet and information technology, continuous growth and advancement of the Internet and information technology.Nevertheless, these applications’ vulnerabilities pose a severe threat to information security and users’ privacy.Fuzzing was widely used as one of the main tools for automatic vulnerability detection due to its ease of vulnerability recurrence and low false positive errors.It generates test cases randomly and executes the application by optimization in terms of coverage or sample generation to detect deeper program paths.However, the mutation operation in fuzzing is blind and tends to make the generated test cases execute the same program path.Consequently, traditional fuzzing tests have problems such as low efficiency, high randomness of inputs generation and limited pertinence of the program structure.To address these problems, a directional fuzzing based on deep reinforcement learning was proposed, which used deep reinforcement learning networks with information obtained by staking program to guide the selection of the inputs.Besides, it enabled fast approximation and inspection of the program paths that may exist vulnerabilities.The experimental results showed that the proposed approach had better performance than the popular fuzzing tools such as AFL and AFLGO in terms of vulnerability detection and recurrence on the LAVA-M dataset and real applications like LibPNG and Binutils.Therefore, the approach can provide support for further vulnerability mining and security research.

Key words: vulnerability mining, fuzzing test, deep reinforcement learning, program path

中图分类号: 

No Suggested Reading articles found!