基于深度强化学习的高性能导向性模糊测试方案

doi:10.11959/j.issn.2096-109x.2023027

网络与信息安全学报 ›› 2023, Vol. 9 ›› Issue (2): 132-142.doi: 10.11959/j.issn.2096-109x.2023027

基于深度强化学习的高性能导向性模糊测试方案

肖天¹, 江智昊¹^,², 唐鹏¹, 黄征¹, 郭捷¹, 邱卫东¹

¹ 上海交通大学网络空间安全学院，上海 200240
² 哥伦比亚大学，美国纽约 10027

修回日期:2023-03-02 出版日期:2023-04-25 发布日期:2023-04-01
作者简介:肖天（1998- ），男，江苏常州人，上海交通大学硕士生，主要研究方向为漏洞挖掘、隐私保护
江智昊（1998- ），男，上海人，主要研究方向为文件系统、漏洞挖掘、分布式系统
唐鹏（1992- ），男，江西抚州人，上海交通大学博士生，主要研究方向为人工智能安全、隐私保护
黄征（1975- ），男，四川南充人，博士，上海交通大学副教授，主要研究方向为隐私保护、计算机视觉、人工智能安全
郭捷（1976- ），女，河南信阳人，博士，上海交通大学副研究员，主要研究方向为多媒体安全、模式识别、大数据分析
邱卫东（1973- ），男，江西九江人，博士，上海交通大学教授、博士生导师，主要研究方向为密码分析/密码工程、人工智能安全、大数据隐私保护
基金资助:
国家自然科学基金(61972249)

High-performance directional fuzzing scheme based on deep reinforcement learning

Tian XIAO¹, Zhihao JIANG¹^,², Peng TANG¹, Zheng HUANG¹, Jie GUO¹, Weidong QIU¹

¹ School of Cyber Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
² Columbia University, New York 10027, America

Revised:2023-03-02 Online:2023-04-25 Published:2023-04-01
Supported by:
The National Natural Science Foundation of China(61972249)

摘要/Abstract

摘要：

随着移动互联网与信息技术的快速发展，越来越多的应用程序融入人们的生活，但这些应用程序中存在的漏洞严重威胁着用户隐私和信息安全。近年来，模糊测试作为流行的漏洞挖掘技术之一，因其漏洞易复现且误报率低的特点而被广泛地使用。它能随机生成测试用例并执行程序，通过覆盖率或样本生成方面的优化以检测更深的程序路径。但是模糊测试中的变异操作存在一定的盲目性，易使生成的测试样本执行相同程序路径。因此传统模糊测试普遍存在挖掘效率低、输入构造的随机性强、算法对程序结构针对性有限等问题。针对上述问题，提出了基于深度强化学习的高性能导向性模糊测试方案，通过程序插桩等方法获取程序运行时的信息，使用深度强化学习网络指导模糊测试选择测试样本，生成有针对性和导向性的测试样本以快速逼近并检验可能存在漏洞的程序路径，从而提高模糊测试的效率。实验表明，在LAVA-M测试集与真实应用程序LibPNG和Binutils上，所提方案比流行模糊测试工具AFL与AFLGO在漏洞检测与复现等方面有着更好的表现，因此该方案可为今后的漏洞挖掘和安全研究提供支撑。

关键词: 漏洞挖掘, 模糊测试, 深度强化学习, 程序路径

Abstract:

With the continuous growth and advancement of the Internet and information technology, continuous growth and advancement of the Internet and information technology.Nevertheless, these applications’ vulnerabilities pose a severe threat to information security and users’ privacy.Fuzzing was widely used as one of the main tools for automatic vulnerability detection due to its ease of vulnerability recurrence and low false positive errors.It generates test cases randomly and executes the application by optimization in terms of coverage or sample generation to detect deeper program paths.However, the mutation operation in fuzzing is blind and tends to make the generated test cases execute the same program path.Consequently, traditional fuzzing tests have problems such as low efficiency, high randomness of inputs generation and limited pertinence of the program structure.To address these problems, a directional fuzzing based on deep reinforcement learning was proposed, which used deep reinforcement learning networks with information obtained by staking program to guide the selection of the inputs.Besides, it enabled fast approximation and inspection of the program paths that may exist vulnerabilities.The experimental results showed that the proposed approach had better performance than the popular fuzzing tools such as AFL and AFLGO in terms of vulnerability detection and recurrence on the LAVA-M dataset and real applications like LibPNG and Binutils.Therefore, the approach can provide support for further vulnerability mining and security research.

Key words: vulnerability mining, fuzzing test, deep reinforcement learning, program path

中图分类号:

TP393

肖天, 江智昊, 唐鹏, 黄征, 郭捷, 邱卫东. 基于深度强化学习的高性能导向性模糊测试方案[J]. 网络与信息安全学报, 2023, 9(2): 132-142.

Tian XIAO, Zhihao JIANG, Peng TANG, Zheng HUANG, Jie GUO, Weidong QIU. High-performance directional fuzzing scheme based on deep reinforcement learning[J]. Chinese Journal of Network and Information Security, 2023, 9(2): 132-142.

图/表 7

图1

图2

图3

图4

表1

表2

表3

参考文献 29

[1]	Cybersecurity ＆ Infrastructure Security Agency. known exploited vulnerabilities catalog[R]. 2023.
[2]	REDSCAN. 2021 has officially been a record-breaking year for vulnerabilities[R]. 2021.
[3]	张雄, 李舟军 . 模糊测试技术研究综述[J]. 计算机科学, 2016,43(5): 1-8,26.
	ZHANG X , LI Z J . Survey of fuzz testing technology[J]. Computer Science, 2016,43(5): 1-8,26.
[4]	邹燕燕, 邹维, 尹嘉伟 ,等. 变异策略感知的并行模糊测试研究[J]. 信息安全学报, 2020,5(5): 1-16.
	ZOU Y Y , ZOU W , YIN J W ,et al. Research on mutator strategy-aware parallel fuzzing[J]. Journal of Cyber Security, 2020,5(5): 1-16.
[5]	FIORALDI A , MAIER D , EI?FELDT H ,et al. AFL++ combining incremental steps of fuzzing research[C]// Proceedings of the 14th USENIX Conference on Offensive Technologies. 2020:10.
[6]	GAN S T , ZHANG C , QIN X J ,et al. CollAFL:path sensitive fuzzing[C]// Proceedings of 2018 IEEE Symposium on Security and Privacy (SP). 2018: 679-696.
[7]	YOU W , ZONG P Y , CHEN K ,et al. SemFuzz:semantics-based automatic generation of proof-of-concept exploits[C]// Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 2017: 2139-2154.
[8]	ARULKUMARAN K , DEISENROTH M P , BRUNDAGE M ,et al. Deep reinforcement learning:a brief survey[J]. IEEE Signal Processing Magazine, 2017,34(6): 26-38.
[9]	BOEHME M , CADAR C , ROYCHOUDHURY A . Fuzzing:challenges and reflections[J]. IEEE Software, 2020,38(3): 79-86.
[10]	HERRERA A , GUNADI H , HAYES L ,et al. Corpus distillation for effective fuzzing:a comparative evaluation[J]. arXiv preprint arXiv:1905.13055, 2019.
[11]	DONG H , DING Z H , ZHANG S H . Deep reinforcement learning:fundamentals,research and applications[M]. Singapore: Springer Singapore, 2020.
[12]	B?HME M , PHAM V T , NGUYEN M D ,et al. Directed greybox fuzzing[C]// Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 2017: 2329-2344.
[13]	MARINESCU P D , CADAR C . KATCH:high-coverage testing of software patches[C]// Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering. 2013: 235-245.
[14]	JIN W , ORSO A . BugRedux:reproducing field failures for in-house debugging[C]// Proceedings of 2012 34th International Conference on Software Engineering (ICSE). 2012: 474-484.
[15]	CHEN H X , XUE Y X , LI Y K ,et al. Hawkeye:towards a desired directed grey-box fuzzer[C]// Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. 2018: 2095-2108.
[16]	ZONG P Y , LYU T , WANG D W ,et al. FuzzGuard:filtering out unreachable inputs in directed grey-box fuzzing through deep learning[C]// Proceedings of the 29th USENIX Conference on Security Symposium. 2020: 2255-2269.
[17]	ZHU X , LIU S , LI X ,et al. DeFuzz:deep learning guided directed fuzzing[J]. arXiv Preprint arXiv:2010.12149, 2020.
[18]	SUTTON R S , BARTO A G . Reinforcement learning:an introduction[M]. Cambridge,Mass: MIT Press, 1998.
[19]	CHAUHAN N , CHOUDHARY N , GEORGE K . A comparison of reinforcement learning based approaches to appliance scheduling[C]// Proceedings of 2016 2nd International Conference on Contemporary Computing and Informatics (IC3I). 2017: 253-258.
[20]	ZHAO T T , KONG L , HAN Y J ,et al. Review of model-based reinforcement learning[J]. Journal of Frontiers of Computer Science and Technology, 2020,14(6): 918-927.
[21]	MNIH V , KAVUKCUOGLU K , SILVER D ,et al. Playing atari with deep reinforcement learning.[C]// Proceedings of Workshops at the 26th Neural Information Processing Systems 2013. 2013: 201-220.
[22]	MNIH V , KAVUKCUOGLU K , SILVER D ,et al. Human-level control through deep reinforcement learning[J]. Nature, 2015,518(7540): 529-533.
[23]	SILVER D , HUANG A , MADDISON C J ,et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016,529(7587): 484-489.
[24]	KR?SE B J A . Learning from delayed rewards[J]. Robotics and Autonomous Systems, 1995,15(4): 233-235.
[25]	LIN L J . Reinforcement learning for robots using neural networks[D]. Pittsburgh,Carnegie Mellon University, 1992.
[26]	LATTNER C , ADVE V . LLVM:a compilation framework for lifelong program analysis ＆ transformation[C]// Proceedings of International Symposium on Code Generation and Optimization. 2004: 75-86.
[27]	PESCH R H , OSIER J M . The GNU binary utilities[J]. Free Software Foundation, 1993.
[28]	DOLAN-GAVITT B , HULIN P , KIRDA E ,et al. LAVA:large-scale automated vulnerability addition[C]// Proceedings of 2016 IEEE Symposium on Security and Privacy (SP). 2016: 110-121.
[29]	METZMAN J , SZEKERES L , SIMON L ,et al. Fuzzbench:an open fuzzer benchmarking platform and service[C]// Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2021: 1393-1403.

测试对象	工具	覆盖次数	平均发现时长/s	性能增益
	AFL	3	68 130	3.89
who	AFLGO	4	30 629	1.75
	本文方案	6	17 542	—
	AFL	5	3 957	4.74
base64	AFLGO	5	1 571	1.88
	本文方案	5	834	—
	AFL	1	67 152	1.79
md5sum	AFLGO	4	42 969	1.15
	本文方案	5	37 578	—
	AFL	5	50 183	2.87
uniq	AFLGO	7	17 466	1.22
	本文方案	7	14 323	—

程序	CVE编号	漏洞类型
LibPNG	CVE-2011-2501	缓冲区溢出
LibPNG	CVE-2011-3328	除数为零
Binutil^]	CVE-2016-4487	无效写入
Binutils	CVE-2016-4488	无效写入
Binutils	CVE-2016-4489	无效写入
Binutils	CVE-2016-4491	非法访问
Binutils	CVE-2016-4492	堆栈错误
Binutils	CVE-2016-6131	非法访问

CVE	AFL	AFLGO	本文方案	P值
CVE-2016-4487	830（2.36）	512（1.46）	351	0.0215 6
CVE-2016-4488	1671（3.47）	901（1.87）	482	0.0180 4
CVE-2016-4489	1328（3.49）	667（1.76）	380	0.0190 6
CVE-2016-4491	31760（2.99）	27983（2.63）	10633	0.0002 8
CVE-2016-4492	948（2.46）	640（1.66）	385	0.0280 4
CVE-2016-6131	33 895（3.45）	21280（2.17）	9821	0.0009 8
CVE-2011-2501	2 162（4.55）	672（1.41）	475	0.0270 4
CVE-2011-3328	12 736（6.77）	3165（1.68）	1880	0.012

基于深度强化学习的高性能导向性模糊测试方案

High-performance directional fuzzing scheme based on deep reinforcement learning

在线阅读

pdf下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 7

参考文献 29

相关文章 1

Metrics

推荐阅读 0