面向智能渗透攻击的欺骗防御方法

doi:10.11959/j.issn.1000-436x.2022202

通信学报 ›› 2022, Vol. 43 ›› Issue (10): 106-120.doi: 10.11959/j.issn.1000-436x.2022202

面向智能渗透攻击的欺骗防御方法

陈晋音¹^,², 胡书隆¹^,², 邢长友³, 张国敏³

¹ 浙江工业大学信息工程学院，浙江杭州 310023
² 浙江工业大学网络空间安全研究院，浙江杭州 310023
³ 陆军工程大学指挥控制工程学院，江苏南京 210007

修回日期:2022-09-29 出版日期:2022-10-25 发布日期:2022-10-01
作者简介:陈晋音（1982− ），女，浙江象山人，博士，浙江工业大学教授、博士生导师，主要研究方向为人工智能、数据挖掘、智能计算
胡书隆（1998− ），男，江西吉安人，浙江工业大学硕士生，主要研究方向为深度强化学习和网络安全
邢长友（1982− ），男，江苏南京人，博士，陆军工程大学副教授、硕士生导师，主要研究方向为网络安全、软件定义网络、网络测量和网络功能虚拟化
张国敏（1979− ），男，江苏南京人，博士，陆军工程大学副教授、硕士生导师，主要研究方向为软件定义网络、网络安全、网络测量和网络功能虚拟化
基金资助:
国家自然科学基金资助项目(62072406);浙江省重点研发计划基金资助项目(2021C01117);2020年工业互联网创新发展工程基金资助项目(TC200H01V);浙江省万人计划科技创新领军人才基金资助项目(2020R52011)

Deception defense method against intelligent penetration attack

Jinyin CHEN¹^,², Shulong HU¹^,², Changyou XING³, Guomin ZHANG³

¹ College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
² Institute of Cyber Space Security, Zhejiang University of Technology, Hangzhou 310023, China
³ College of Command ＆Control Engineering, Army Engineering University, Nanjing 210007, China

Revised:2022-09-29 Online:2022-10-25 Published:2022-10-01
Supported by:
The National Natural Science Foundation of China(62072406);The Key Research and Development Program of Zhejiang Province(2021C01117);The 2020 Industrial Internet Innovation Development Project(TC200H01V);The Ten Thousand Talents Program of Zhejiang Province(2020R52011)

摘要/Abstract

摘要：

摘要：基于强化学习的智能渗透攻击旨在将渗透过程建模为马尔可夫决策过程，以不断试错的方式训练攻击者进行渗透路径寻优，从而使攻击者具有较强的攻击能力。为了防止智能渗透攻击被恶意利用，提出一种面向基于强化学习的智能渗透攻击的欺骗防御方法。首先，获取攻击者在构建渗透攻击模型时的必要信息（状态、动作、奖励）；其次，分别通过状态维度置反扰乱动作生成，通过奖励值符号翻转进行混淆欺骗，实现对应于渗透攻击的前期、中期及末期的欺骗防御；最后，在同一网络环境中展开3个阶段的防御对比实验。实验结果表明，所提方法可以有效降低基于强化学习的智能渗透攻击成功率，其中，扰乱攻击者动作生成的欺骗方法在干扰比例为20%时，渗透攻击成功率降低为0。

关键词: 强化学习, 智能渗透攻击, 攻击路径, 欺骗防御

Abstract:

The intelligent penetration attack based on reinforcement learning aims to model the penetration process as a Markov decision process, and train the attacker to optimize the penetration path in a trial-and-error manner, so as to achieve strong attack performance.In order to prevent intelligent penetration attacks from being maliciously exploited, a deception defense method for intelligent penetration attack based on reinforcement learning was proposed.Firstly, obtaining the necessary information for the attacker to construct the penetration model, which included state, action and reward.Secondly, conducting deception defense against the attacker through inverting the state dimension, disrupting the action generation, and flipping the reward value sign, respectively, which corresponded to the early, middle and final stages of the penetration attack.At last, the three-stage defense comparison experiments were carried out in the same network environment.The results show that the proposed method can effectively reduce the success rate of intelligent penetration attacks based on reinforcement learning.Besides, the deception method that disrupts the action generation of the attacker can reduce the penetration attack success rate to 0 when the interference ratio is 20%.

Key words: reinforcement learning, intelligent penetration attack, attack path, deception defense

中图分类号:

TP393.08

陈晋音, 胡书隆, 邢长友, 张国敏. 面向智能渗透攻击的欺骗防御方法[J]. 通信学报, 2022, 43(10): 106-120.

Jinyin CHEN, Shulong HU, Changyou XING, Guomin ZHANG. Deception defense method against intelligent penetration attack[J]. Journal on Communications, 2022, 43(10): 106-120.

图/表 14

图1

图2

表1

表2

图5

图6

图7

图8

图9

图10

图11

表3

图12

表4

参考文献 28

[1]	ARKIN B , STENDER S , MCGRAW G . Software penetration testing[J]. IEEE Security ＆ Privacy, 2005,3(1): 84-87.
[2]	杨宏宇, 袁海航, 张良 . 基于攻击图的主机安全评估方法[J]. 通信学报, 2022,43(2): 89-99.
	ROWE N C , CUSTY EJ , DUONG B T . Defending cyberspace with fake honeypots[J]. Journal of Computers, 2007,2(2): 25-36.
[3]	KAUR G , KAUR N . Penetration testing-reconnaissance with Nmap tool[J]. International Journal of Advanced Research in Computer Science, 2017,8(3): 844-846.
[4]	MULI?SKI T . ICT security in tax administration - Rapid7 Nexpose vulnerability analysis[J]. Studia Informatica, 2021,24: 37-51.
[5]	LEE A . Advanced penetration testing for highly-secured environments:the ultimate security guide[M]. Birmingham: Packt Publishing, 2012.
[6]	HelpSysthems. Core impact[EB]. 2021.
[7]	SAYED A . Adaptation,learning,and optimization over networks[J]. Foundations and Trends in Machine Learning, 2014,7(4/5): 311-801.
[8]	MNIH V , KAVUKCUOGLU K , SILVER D ,et al. Playing atari with deep reinforcement learning[J]. arXiv Preprint,arXiv:1312.5602, 2013.
[9]	ZHOU S C , LIU J J , HOU D D ,et al. Autonomous penetration testing based on improved deep Q-network[J]. Applied Sciences, 2021,11(19): 8823.
[10]	TRAN K , AKELLA A , STANDEN M ,et al. Deep hierarchical reinforcement agents for automated penetration testing[J]. arXiv Preprint,arXiv:2109.06449, 2021.
[11]	DULAC-ARNOLD G , EVANS R , SUNEHAGP ,et al. Reinforcement learning in large discrete action spaces[J]. arXiv Preprint,arXiv:1512.07679, 2015.
[12]	YUILL J J . Defensive computer-security deception operations:processes,principles and techniques[D]. Raleigh:North Carolina State University, 2006.
[13]	Gartner Research. Hype cycle for threat-facing technologies 2017[R]. 2017.
[14]	贾召鹏, 方滨兴, 刘潮歌 ,等. 网络欺骗技术综述[J]. 通信学报, 2017,38(12): 128-143.
	JIA Z P , FANG B X , LIU C G ,et al. Survey on cyber deception[J]. Journal on Communications, 2017,38(12): 128-143.
[15]	胡永进, 马骏, 郭渊博 . 基于博弈论的网络欺骗研究[J]. 通信学报, 2018,39(S2): 9-18.
	HU Y J , MA J , GUO Y B . Research on cyber deception based on game theory[J]. Journal on Communications, 2018,39(S2): 9-18.
[16]	王硕, 王建华, 裴庆祺 ,等. 基于动态伪装网络的主动欺骗防御方法[J]. 通信学报, 2020,41(2): 97-111.
	WANG S , WANG J H , PEI Q Q ,et al. Active deception defense method based on dynamic camouflage network[J]. Journal on Communications, 2020,41(2): 97-111.
[17]	JAFARIAN J H , AL-SHAER E , DUAN Q . Adversary-aware IP address randomization for proactive agility against sophisticated attackers[C]// Proceedings of 2015 IEEE Conference on Computer Communications. Piscataway:IEEE Press, 2015: 738-746.
[18]	WANG K , CHEN X , ZHU Y F . Random domain name and address mutation (RDAM) for thwarting reconnaissance attacks[J]. PLoS One, 2017,12(5): e0177111.
[19]	ANAGNOSTAKIS K , SIDIROGLOU S , AKRITIDIS P ,et al. Detecting targeted attacks using shadow honeypots[C]// Proceedings of the 14th Conference on USENIX Security Symposium. Berkeley:USENIX Association, 2005:9.
[20]	ROWE N C , CUSTY E J , DUONG B T . Defending cyberspace with fake honeypots[J]. Journal of Computers, 2007,2(2): 25-36.
[21]	石乐义, 姜蓝蓝, 刘昕 ,等. 拟态式蜜罐诱骗特性的博弈理论分析[J]. 电子与信息学报, 2013,35(5): 1063-1068.
	SHI L Y , JIANG L L , LIU X ,et al. Game theoretic analysis for the feature of mimicry honeypot[J]. Journal of Electronics ＆ Information Technology, 2013,35(5): 1063-1068.
[22]	SILVER D , HUANG A , MADDISON C J ,et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016,529(7587): 484-489.
[23]	BERNER C , BROCKMAN G , CHAN B ,et al. Dota 2 with large scale deep reinforcement learning[J]. arXiv Preprint,arXiv:1912.06680, 2019.
[24]	VINYALS O , BABUSCHKIN I , CZARNECKI W M ,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning[J]. Nature, 2019,575(7782): 350-354.
[25]	SCHWARTZ J , KURNIAWATI H . Autonomous penetration testing using reinforcement learning[J]. arXiv Preprint,arXiv:1905.05965, 2019.
[26]	ZENNARO F M , ERDODI L . Modeling penetration testing with reinforcement learning using capture-the-flag challenges and tabular Q-learning[J]. arXiv Preprint,arXiv:2005.12632, 2005.
[27]	臧艺超, 周天阳, 朱俊虎 ,等. 领域独立智能规划技术及其面向自动化渗透测试的攻击路径发现研究进展[J]. 电子与信息学报, 2020,42(9): 2095-2107.
	ZANG Y C , ZHOU T Y , ZHU J H ,et al. Domain-independent intelligent planning technology and its application to automated penetration testing oriented attack path discovery[J]. Journal of Electronics ＆Information Technology, 2020,42(9): 2095-2107.
[28]	SCHWARTZ J . Network attack simulator[EB]. 2017.

主机地址	操作系统	漏洞服务	权限提升进程	主机价值
(1,0)	Linux	HTTP	Tomcat	0
(2,0)	Linux	SSH，FTP	/	100
(3,0)	Windows	FTP	/	0
(3,1)	Windows	FTP，HTTP	Daclsvc	0
(3,2)	Windows	FTP，HTTP	Daclsvc	0
(3,3)	Windows	FTP	/	0
(3,4)	Windows	FTP	Daclsvc	0
(4,0)	Linux	SSH，FTP	Tomacat	100

名称	类型	操作系统	成本消耗	概率	访问权限
SSH	渗透	Linux	3	0.9	User
FTP	渗透	Windows	1	0.6	User
HTTP	渗透	/	2	0.9	User
Tomcat	提权	Linux	1	1	Root
Daclsvc	提权	Windows	1	1	Root
服务扫描	扫描	/	1	1	/
操作系统扫描	扫描	/	1	1	/
子网扫描	扫描	/	1	1	/
进程扫描	扫描	/	1	1	/

干扰比例	奖励	回合步数	渗透敏感主机概率
20%	180	12	100%
50%	176	15	100%
80%	-1 030	1 000	0
100%	-1 172	1 000	0

干扰比例	奖励	回合步数	渗透敏感主机概率
20%	180	12	2%
50%	-109	5	100%
80%	-112	6	100%
100%	-109	5	100%

面向智能渗透攻击的欺骗防御方法

Deception defense method against intelligent penetration attack

在线阅读

PDF下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 14

参考文献 28

相关文章 15

Metrics

推荐阅读 0

[1]	马玲, 樊漆亮, 许婷, 郭冠琛, 张圣林, 孙永谦, 张玉志. 基于强化学习的在线离线混部云环境下的调度框架[J]. 通信学报, 2023, 44(6): 90-102.
[2]	金彪, 李逸康, 姚志强, 陈瑜霖, 熊金波. GenFedRL：面向深度强化学习智能体的通用联邦强化学习框架[J]. 通信学报, 2023, 44(6): 183-197.
[3]	李元诚, 秦永泰. 基于深度强化学习的软件定义安全中台QoS实时优化算法[J]. 通信学报, 2023, 44(5): 181-192.
[4]	周大成, 陈鸿昶, 何威振, 程国振, 扈红超. 基于深度强化学习的微服务多维动态防御策略研究[J]. 通信学报, 2023, 44(4): 50-63.
[5]	许国良, 谭峰, 冉泳屹, 陈丰. 面向多波束卫星系统的波束跳变与覆盖控制联合优化算法[J]. 通信学报, 2023, 44(4): 78-86.
[6]	许文俊, 吴思雷, 王凤玉, 林兰, 李国军, 张治. 基于多智能体强化学习的大规模灾后用户分布式覆盖优化[J]. 通信学报, 2022, 43(8): 1-16.
[7]	沙宗轩, 霍如, 孙闯, 汪硕, 黄韬. 基于深度强化学习的转发效能感知流量调度算法[J]. 通信学报, 2022, 43(8): 30-40.
[8]	马帅, 李兵, 盛海鸿, 谷荣妍, 周辉, 王洪梅, 王悦, 李世银. 基于深度强化学习的可见光定位通信一体化功率分配研究[J]. 通信学报, 2022, 43(8): 121-130.
[9]	张宇, 程旻. NDN中边缘计算与缓存的联合优化[J]. 通信学报, 2022, 43(8): 164-175.
[10]	左珮良, 侯少龙, 郭超, 蒋华, 王文博. 基于强化学习的多层卫星网络边缘安全决策方法[J]. 通信学报, 2022, 43(6): 189-199.
[11]	张先超, 赵耀, 叶海军, 樊锐. 无线网络多用户干扰下智能发射功率控制算法[J]. 通信学报, 2022, 43(2): 15-21.
[12]	李传煌, 陈泱婷, 唐晶晶, 楼佳丽, 谢仁华, 方春涛, 王伟明, 陈超. QL-STCT：一种SDN链路故障智能路由收敛方法[J]. 通信学报, 2022, 43(2): 131-142.
[13]	苏新, 孟蕾蕾, 周一青, CELIMUGE Wu. 基于深度强化学习的海洋移动边缘计算卸载方法[J]. 通信学报, 2022, 43(10): 133-145.
[14]	杜丽娜, 卓力, 杨硕, 李嘉锋, 张菁. 基于强化学习的移动视频流业务码率自适应算法研究进展[J]. 通信学报, 2021, 42(9): 205-217.
[15]	龚翔, 冯涛, 杜谨泽. 基于CPN的安全协议形式化建模及安全分析方法[J]. 通信学报, 2021, 42(9): 240-253.