面向智能渗透攻击的欺骗防御方法

doi:10.11959/j.issn.1000-436x.2022202

Abstract

Abstract:

The intelligent penetration attack based on reinforcement learning aims to model the penetration process as a Markov decision process, and train the attacker to optimize the penetration path in a trial-and-error manner, so as to achieve strong attack performance.In order to prevent intelligent penetration attacks from being maliciously exploited, a deception defense method for intelligent penetration attack based on reinforcement learning was proposed.Firstly, obtaining the necessary information for the attacker to construct the penetration model, which included state, action and reward.Secondly, conducting deception defense against the attacker through inverting the state dimension, disrupting the action generation, and flipping the reward value sign, respectively, which corresponded to the early, middle and final stages of the penetration attack.At last, the three-stage defense comparison experiments were carried out in the same network environment.The results show that the proposed method can effectively reduce the success rate of intelligent penetration attacks based on reinforcement learning.Besides, the deception method that disrupts the action generation of the attacker can reduce the penetration attack success rate to 0 when the interference ratio is 20%.

Key words: reinforcement learning, intelligent penetration attack, attack path, deception defense

CLC Number:

TP393.08

Jinyin CHEN, Shulong HU, Changyou XING, Guomin ZHANG. Deception defense method against intelligent penetration attack[J]. Journal on Communications, 2022, 43(10): 106-120.

Figures/Tables 14

References 28

[1]	ARKIN B , STENDER S , MCGRAW G . Software penetration testing[J]. IEEE Security ＆ Privacy, 2005,3(1): 84-87.
[2]	杨宏宇, 袁海航, 张良 . 基于攻击图的主机安全评估方法[J]. 通信学报, 2022,43(2): 89-99.
	ROWE N C , CUSTY EJ , DUONG B T . Defending cyberspace with fake honeypots[J]. Journal of Computers, 2007,2(2): 25-36.
[3]	KAUR G , KAUR N . Penetration testing-reconnaissance with Nmap tool[J]. International Journal of Advanced Research in Computer Science, 2017,8(3): 844-846.
[4]	MULI?SKI T . ICT security in tax administration - Rapid7 Nexpose vulnerability analysis[J]. Studia Informatica, 2021,24: 37-51.
[5]	LEE A . Advanced penetration testing for highly-secured environments:the ultimate security guide[M]. Birmingham: Packt Publishing, 2012.
[6]	HelpSysthems. Core impact[EB]. 2021.
[7]	SAYED A . Adaptation,learning,and optimization over networks[J]. Foundations and Trends in Machine Learning, 2014,7(4/5): 311-801.
[8]	MNIH V , KAVUKCUOGLU K , SILVER D ,et al. Playing atari with deep reinforcement learning[J]. arXiv Preprint,arXiv:1312.5602, 2013.
[9]	ZHOU S C , LIU J J , HOU D D ,et al. Autonomous penetration testing based on improved deep Q-network[J]. Applied Sciences, 2021,11(19): 8823.
[10]	TRAN K , AKELLA A , STANDEN M ,et al. Deep hierarchical reinforcement agents for automated penetration testing[J]. arXiv Preprint,arXiv:2109.06449, 2021.
[11]	DULAC-ARNOLD G , EVANS R , SUNEHAGP ,et al. Reinforcement learning in large discrete action spaces[J]. arXiv Preprint,arXiv:1512.07679, 2015.
[12]	YUILL J J . Defensive computer-security deception operations:processes,principles and techniques[D]. Raleigh:North Carolina State University, 2006.
[13]	Gartner Research. Hype cycle for threat-facing technologies 2017[R]. 2017.
[14]	贾召鹏, 方滨兴, 刘潮歌 ,等. 网络欺骗技术综述[J]. 通信学报, 2017,38(12): 128-143.
	JIA Z P , FANG B X , LIU C G ,et al. Survey on cyber deception[J]. Journal on Communications, 2017,38(12): 128-143.
[15]	胡永进, 马骏, 郭渊博 . 基于博弈论的网络欺骗研究[J]. 通信学报, 2018,39(S2): 9-18.
	HU Y J , MA J , GUO Y B . Research on cyber deception based on game theory[J]. Journal on Communications, 2018,39(S2): 9-18.
[16]	王硕, 王建华, 裴庆祺 ,等. 基于动态伪装网络的主动欺骗防御方法[J]. 通信学报, 2020,41(2): 97-111.
	WANG S , WANG J H , PEI Q Q ,et al. Active deception defense method based on dynamic camouflage network[J]. Journal on Communications, 2020,41(2): 97-111.
[17]	JAFARIAN J H , AL-SHAER E , DUAN Q . Adversary-aware IP address randomization for proactive agility against sophisticated attackers[C]// Proceedings of 2015 IEEE Conference on Computer Communications. Piscataway:IEEE Press, 2015: 738-746.
[18]	WANG K , CHEN X , ZHU Y F . Random domain name and address mutation (RDAM) for thwarting reconnaissance attacks[J]. PLoS One, 2017,12(5): e0177111.
[19]	ANAGNOSTAKIS K , SIDIROGLOU S , AKRITIDIS P ,et al. Detecting targeted attacks using shadow honeypots[C]// Proceedings of the 14th Conference on USENIX Security Symposium. Berkeley:USENIX Association, 2005:9.
[20]	ROWE N C , CUSTY E J , DUONG B T . Defending cyberspace with fake honeypots[J]. Journal of Computers, 2007,2(2): 25-36.
[21]	石乐义, 姜蓝蓝, 刘昕 ,等. 拟态式蜜罐诱骗特性的博弈理论分析[J]. 电子与信息学报, 2013,35(5): 1063-1068.
	SHI L Y , JIANG L L , LIU X ,et al. Game theoretic analysis for the feature of mimicry honeypot[J]. Journal of Electronics ＆ Information Technology, 2013,35(5): 1063-1068.
[22]	SILVER D , HUANG A , MADDISON C J ,et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016,529(7587): 484-489.
[23]	BERNER C , BROCKMAN G , CHAN B ,et al. Dota 2 with large scale deep reinforcement learning[J]. arXiv Preprint,arXiv:1912.06680, 2019.
[24]	VINYALS O , BABUSCHKIN I , CZARNECKI W M ,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning[J]. Nature, 2019,575(7782): 350-354.
[25]	SCHWARTZ J , KURNIAWATI H . Autonomous penetration testing using reinforcement learning[J]. arXiv Preprint,arXiv:1905.05965, 2019.
[26]	ZENNARO F M , ERDODI L . Modeling penetration testing with reinforcement learning using capture-the-flag challenges and tabular Q-learning[J]. arXiv Preprint,arXiv:2005.12632, 2005.
[27]	臧艺超, 周天阳, 朱俊虎 ,等. 领域独立智能规划技术及其面向自动化渗透测试的攻击路径发现研究进展[J]. 电子与信息学报, 2020,42(9): 2095-2107.
	ZANG Y C , ZHOU T Y , ZHU J H ,et al. Domain-independent intelligent planning technology and its application to automated penetration testing oriented attack path discovery[J]. Journal of Electronics ＆Information Technology, 2020,42(9): 2095-2107.
[28]	SCHWARTZ J . Network attack simulator[EB]. 2017.

Metrics

Recommended 0

No Suggested Reading articles found!

主机地址	操作系统	漏洞服务	权限提升进程	主机价值
(1,0)	Linux	HTTP	Tomcat	0
(2,0)	Linux	SSH，FTP	/	100
(3,0)	Windows	FTP	/	0
(3,1)	Windows	FTP，HTTP	Daclsvc	0
(3,2)	Windows	FTP，HTTP	Daclsvc	0
(3,3)	Windows	FTP	/	0
(3,4)	Windows	FTP	Daclsvc	0
(4,0)	Linux	SSH，FTP	Tomacat	100

名称	类型	操作系统	成本消耗	概率	访问权限
SSH	渗透	Linux	3	0.9	User
FTP	渗透	Windows	1	0.6	User
HTTP	渗透	/	2	0.9	User
Tomcat	提权	Linux	1	1	Root
Daclsvc	提权	Windows	1	1	Root
服务扫描	扫描	/	1	1	/
操作系统扫描	扫描	/	1	1	/
子网扫描	扫描	/	1	1	/
进程扫描	扫描	/	1	1	/

干扰比例	奖励	回合步数	渗透敏感主机概率
20%	180	12	100%
50%	176	15	100%
80%	-1 030	1 000	0
100%	-1 172	1 000	0

干扰比例	奖励	回合步数	渗透敏感主机概率
20%	180	12	2%
50%	-109	5	100%
80%	-112	6	100%
100%	-109	5	100%

Deception defense method against intelligent penetration attack

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 14

References 28

Related Articles 15

Metrics

Recommended 0

[1]	Ling MA, Qiliang FAN, Ting XU, Guanchen GUO, Shenglin ZHANG, Yongqian SUN, Yuzhi ZHANG. Scheduling framework based on reinforcement learning in online-offline colocated cloud environment [J]. Journal on Communications, 2023, 44(6): 90-102.
[2]	Biao JIN, Yikang LI, Zhiqiang YAO, Yulin CHEN, Jinbo XIONG. GenFedRL: a general federated reinforcement learning framework for deep reinforcement learning agents [J]. Journal on Communications, 2023, 44(6): 183-197.
[3]	Yuancheng LI, Yongtai QIN. Deep reinforcement learning based algorithm for real-time QoS optimization of software-defined security middle platform [J]. Journal on Communications, 2023, 44(5): 181-192.
[4]	Dacheng ZHOU, Hongchang CHEN, Weizhen HE, Guozhen CHENG, Hongchao HU. Research on multidimensional dynamic defense strategy for microservice based on deep reinforcement learning [J]. Journal on Communications, 2023, 44(4): 50-63.
[5]	Guoliang XU, Feng TAN, Yongyi RAN, Feng CHEN. Joint beam hopping and coverage control optimization algorithm for multibeam satellite system [J]. Journal on Communications, 2023, 44(4): 78-86.
[6]	Wenjun XU, Silei WU, Fengyu WANG, Lan LIN, Guojun LI, Zhi ZHANG. Large-scale post-disaster user distributed coverage optimization based on multi-agent reinforcement learning [J]. Journal on Communications, 2022, 43(8): 1-16.
[7]	Zongxuan SHA, Ru HUO, Chuang SUN, Shuo WANG, Tao HUANG. Forwarding efficiency aware traffic scheduling algorithm based on deep reinforcement learning [J]. Journal on Communications, 2022, 43(8): 30-40.
[8]	Shuai MA, Bing LI, Haihong SHENG, Rongyan GU, Hui ZHOU, Hongmei WANG, Yue WANG, Shiyin LI. Research on power allocation of integrated VLPC based on deep reinforcement learning [J]. Journal on Communications, 2022, 43(8): 121-130.
[9]	Yu ZHANG, Min CHENG. Joint optimization of edge computing and caching in NDN [J]. Journal on Communications, 2022, 43(8): 164-175.
[10]	Peiliang ZUO, Shaolong HOU, Chao GUO, Hua JIANG, Wenbo WANG. Security decision method for the edge of multi-layer satellite network based on reinforcement learning [J]. Journal on Communications, 2022, 43(6): 189-199.
[11]	Xianchao ZHANG, Yao ZHAO, Haijun YE, Rui FAN. Intelligent transmit power control algorithm for the multi-user interference of wireless network [J]. Journal on Communications, 2022, 43(2): 15-21.
[12]	Chuanhuang LI, Yangting CHEN, Jingjing TANG, Jiali LOU, Renhua XIE, Chuntao FANG, Weiming WANG, Chao CHEN. QL-STCT: an intelligent routing convergence method for SDN link failure [J]. Journal on Communications, 2022, 43(2): 131-142.
[13]	Xin SU, Leilei MENG, Yiqing ZHOU, Wu CELIMUGE. Maritime mobile edge computing offloading method based on deep reinforcement learning [J]. Journal on Communications, 2022, 43(10): 133-145.
[14]	Li’na DU, Li ZHUO, Shuo YANG, Jiafeng LI, Jing ZHANG. Survey on reinforcement learning based adaptive bit rate algorithm for mobile video streaming services [J]. Journal on Communications, 2021, 42(9): 205-217.
[15]	Xiang GONG, Tao FENG, Jinze DU. Formal modeling and security analysis method of security protocol based on CPN [J]. Journal on Communications, 2021, 42(9): 240-253.