基于强化学习的自动化Windows域渗透方法

doi:10.11959/j.issn.2096-109x.2023057

Abstract

Abstract:

Windows domain provides a unified system service for resource sharing and information interaction among users.However, this also introduces significant security risks while facilitating intranet management.In recent years, intranet attacks targeting domain controllers have become increasingly prevalent, necessitating automated penetration testing to detect vulnerabilities and ensure the ongoing maintenance of office network operations.Then efficient identification of attack paths within the domain environment is crucial.The penetration process was first modeled using reinforcement learning, and attack paths were then discovered and verified through the interaction of the model with the domain environment.Furthermore, unnecessary states in the reinforcement learning model were trimmed based on the contribution differences of hosts to the penetration process, aiming to optimize the path selection strategy and improve the actual attack efficiency.The Q-learning algorithms with solution space refinement and exploration policy optimization were utilized to filter the optimal attack path.By employing this method, all security threats in the domain can be automatically verified, providing a valuable protection basis for domain administrators.Experiments were conducted on typical Windows domain scenarios, and the results show that the optimal path is selected from the thirteen efficient paths generated by the proposed method, while also providing better performance optimization in terms of domain controller intrusion, domain host intrusion, attack steps, convergence, and time cost compared to other approaches.

Key words: Windows domain, penetration testing, reinforcement learning, attack path

Lige ZHAN, Letian SHA, Fu XIAO, Jiankuo DONG, Pinchang ZHANG. Automated Windows domain penetration method based on reinforcement learning[J]. Chinese Journal of Network and Information Security, 2023, 9(4): 104-120.

Figures/Tables 11

References 40

[1]	尹圣超 . Windows域攻防关键技术研究[D]. 西安:西安电子科技大学, 2021.
	YIN Z C . Research on key technologies of attack and defense of Windows domain[D]. Xi’an:Xidian University, 2021.
[2]	SHEBLI H , BEHESHTI B D . A study on penetration testing process and tools[C]// 2018 IEEE Long Island Systems,Applications and Technology Conference (LISAT), 2018: 1-7.
[3]	STEFINKO Y , PISKOZUB A , BANAKH R . Manual and automated penetration testing,benefits and drawbacks,modern tendency[C]// 2016 13th International Conference on Modern Problems of Radio Engineering,Telecommunications and Computer Science(TCSET),. 2016: 488-491.
[4]	ALMUBAIRIK N A , WILLS G . Automated penetration testing based on a threat model[C]// 2016 11th International Conference for Internet Technology and Secured Transactions (ICITST). 2016: 413-414.
[5]	王硕, 汤光明, 寇广 ,等. 基于因果知识网络的攻击路径预测方法[J]. 通信学报, 2016,37(10): 188-198.
	WANG S , TANG G M , KOU G ,et al. Attack path prediction method based on causal knowledge net[J]. Journal on Communications, 2016,37(10): 188-198.
[6]	王硕, 王建华, 汤光明 ,等. 一种智能高效的最优渗透路径生成方法[J]. 计算机研究与发展, 2019,56(5): 929-941.
	WANG S , WANG J H , TANG G M ,et al. Intelligent and efficient method for optimal penetration path generation[J]. Journal of Computer Research and Development, 2019,56(5): 929-941.
[7]	高阳, 陈世福, 陆鑫 . 强化学习研究综述[J]. 自动化学报, 2004,30(1): 86-100.
	GAO Y , CHEN S F , LU X . Research on reinforcement learning technology:a review[J]. ACTA Automatica Sinica, 2004,30(1): 86-100.
[8]	WATKINS C J C H , DAYAN P . Q-learning[J]. Machine Learning, 1992,8(3-4): 279-292.
[9]	SCHNEIER B . Attack trees-modeling security threats[J]. Drdobbs Journal, 1999,24(12): 21-29.
[10]	SHEYNER O , HAINES J , JHA S ,et al. Automated generation and analysis of attack graphs[C]// Proceedings 2002 IEEE Symposium on Security and Privacy. 2002: 273-284.
[11]	叶子维, 郭渊博, 王宸东 ,等. 攻击图技术应用研究综述[J]. 通信学报, 2017,38(11): 121-132.
	YE Z W , GUO Y B , WANG C D ,et al. Survey on application of attack graph technology[J]. Journal on Communications, 2017,38(11): 121-132.
[12]	陈锋, 张怡, 苏金树 ,等. 攻击图的两种形式化分析[J]. 软件学报, 2010,21(4): 838-848.
	CHEN F , ZHANG Y , SU J S ,et al. Two formal analyses of attack graphs[J]. Journal of Software, 2010,21(4): 838-848.
[13]	YOUSEFI M , MTETWA N , Zhang Y ,et al. A reinforcement learning approach for attack graph analysis[C]// 2018 17th IEEE International Conference On Trust,Security And Privacy In Computing And Communications/12th IEEE International Conference on Big Data Science And Engineering (TrustCom/BigDataSE). 2018: 212-217.
[14]	张蕾, 崔勇, 刘静 ,等. 机器学习在网络空间安全研究中的应用[J]. 计算机学报, 2018,41(9): 1943-1975.
	ZHANG L , CUI Y , LIU J ,et al. Application of machine learning in cyberspace security research[J]. Chinese Journal of Computers, 2018,41(9): 1943-1975.
[15]	OU X , GOVINDAVAJHALA S , APPEL A W . MulVAL:a logic-based network security analyzer[C]// USENIX Security Symposium. 2005: 113-128.
[16]	HU Z , BEURAN R , TAN Y . Automated penetration testing using deep reinforcement learning[C]// 2020 IEEE European Symposium on Security and Privacy Workshops (EuroS ＆ PW). 2020: 2-10.
[17]	刘全, 翟建伟, 章宗长 ,等. 深度强化学习综述[J]. 计算机学报, 2018,41(1): 1-27.
	LIU Q , ZHAI J W , ZHANG Z C ,et al. A survey on deep reinforcement learning[J]. Chinese Journal of Computers, 2018,41(1): 1-27.
[18]	MNIH V , KAVUKCUOGLU K , SILVER D ,et al. Human-level control through deep reinforcement learning[J]. Nature, 2015,518(7540): 529-533.
[19]	周飞燕, 金林鹏, 董军 . 卷积神经网络研究综述[J]. 计算机学报, 2017,40(6): 1229-1251.
	ZHOU F Y , JIN L P , DONG J . Review of convolutional neural network[J]. Chinese Journal of Computers, 2017,40(6): 1229-1251.
[20]	LI Y , LI X . Research on multi-Target network security assessment with attack graph expert system model[J]. Scientific Programming, 2021,(3): 1-11.
[21]	OBES J L , SARRAUTE C , RICHARTE G . Attack planning in the real world[C]// Working Notes for the 2010 AAAI Workshop on Intelligent Security (SecArt). 2010:10.
[22]	HASLUM P , LIPOVETZKY N , MAGAZZENI D ,et al. An introduction to the planning domain definition language[J]. Synthesis Lectures on Artificial Intelligence and Machine Learning, 2019,13(2): 1-187.
[23]	臧艺超, 周天阳, 朱俊虎 ,等. 领域独立智能规划技术及其面向自动化渗透测试的攻击路径发现研究进展[J]. 电子与信息学报, 2020,42(9): 2095-2107.
	ZANG Y C , ZHOU T Y , ZHU J H ,et al. Domain-Independent intelligent planning technology and its application to automated penetration testing oriented attack path discovery[J]. Journal of Electronics and Information Technology, 2020,42(9): 2095-2107.
[24]	SARRAUTE C , RICHARTE G , LUCáNGELI O J . An algorithm to find optimal attack paths in nondeterministic scenarios[C]// Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence. 2011: 71-80.
[25]	SOHN S , OH J , LEE H . Hierarchical reinforcement learning for zero-shot generalization with subtask dependencies[C]// The 32nd Conference on Neural Information Processing Systems. 2018: 7156-7166.
[26]	DE SILVA L , PADGHAM L , SARDINA S . HTN-like solutions for classical planning problems:an application to BDI agent systems[J]. Theoretical Computer Science, 2019,763: 12-37.
[27]	MOHR F , WEVER M , HüLLERMEIER E . ML-plan:automated machine learning via hierarchical planning[J]. Machine Learning, 2018,107(8-10): 1495-1515.
[28]	Core impact[J]. SC Magazine:The International Journal of Computer Security, 2010(2): 21.
[29]	HOFFMANN J . The metric-FF planning system:translating “ignoring delete lists” to numeric state variables[J]. Journal of Artificial Intelligence Research, 2011,20: 291-341.
[30]	范长杰 . 基于马尔可夫决策理论的规划问题的研究[D]. 合肥:中国科学技术大学, 2008.
	FAN C J . Research on planning based on Markov decision theory[D]. Hefei:University of Science and Technology of China, 2008.
[31]	赵海妮, 焦健 . 基于强化学习的渗透路径推荐模型[J]. 计算机应用, 2022: 1-7.
	ZHAO H N , JIAO J . Infiltration path recommendation model based on reinforcement learning[J]. Journal of Computer Applications, 2022: 1-7.
[32]	DURKOTA K , LISY V . Computing optimal policies for attack graphs with action failures and costs[C]// The 7th European Starting AI Researcher Symposium. 2014: 101-110.
[33]	SHMARYAHU D , SHANI G , HOFFMANN J ,et al. Constructing plan trees for simulated penetration testing[C]// The 26th International Conference on Automated Planning and Scheduling. 2016:121.
[34]	KAELBLING L P , LITTMAN M L , CASSANDRA A R . Planning and acting in partially observable stochastic domains[J]. Artificial Intelligence, 1998,101(1-2): 99-134.
[35]	LIU B B , KANG Y , JIANG X F ,et al. A fast approximation method for partially observable Markov decision processes[J]. Journal of Systems Science and Complexity, 2018,31(6): 1423-1436.
[36]	LIU B B , KANG Y , JIANG X F ,et al. A fast approximation method for partially observable Markov decision processes[J]. Journal of Systems Science and Complexity, 2018,31(6): 1423-1436.
[37]	王作广, 魏强, 刘雯雯 . 基于攻击树与 CVSS 的工业控制系统风险量化评估[J]. 计算机应用研究, 2016,33(12): 3785-3790.
	WANG ZG , WEI Q , LIU W W . Quantitative risk assessment of industrial control systems based on attack trees and CVSS[J]. Application Research of Computers, 2016,33(12): 3785-3790.
[38]	刘奇旭, 张翀斌, 张玉清 ,等. 安全漏洞等级划分关键技术研究[J]. 通信学报, 2012,33(S1): 79-87.
	LIU Q X , ZHANG C B , ZHANG Y Q ,et al. Research on the key technology of security vulnerability threat classification[J]. Journal on Communications, 2012,33(S1): 79-87.
[39]	AonCyberLabs. Windows-exploit-suggester[EB].
[40]	byt3bl33d3r. DeathStar[EB].

Metrics

Recommended 0

No Suggested Reading articles found!

主机：简称	编号	操作系统	漏洞：漏洞类型	主机内域用户	权限
Web服务器：Web	1	Windows 2012	CVE_2019_2725:权限获取	User2	Web、PC5、PC6
分区管理主机：PC1	2	Windows 7	CVE_2019_0708:权限获取	Userl	DB、PC1
运维主机：PC2	3	Windows 10	CVE_2020_0796：权限获取	User3	PC2
后勤保障主机：PC3	4	Windows 7	CVE_2017_0143:权限获取	User4	PC3
业务主机1：PC4	5	Windows 7	CVE_2018_8120：权限提升	User5	PC4
业务主机2：PC5	6	Windows 7	CVE_2017_0143：权限获取	User2	Web、PC5、PC6
业务主机3：PC6	7	Windows XP	CVE_2008_4250：权限获取	User2	Web、PC5、PC6
数据库服务器：DB	8	Windows 2008	CVE_2017_7269:权限获取CVE_2016_3225：权限提升	User1Admin	DB、PC1ALL
邮件服务器：MAIL	9	Windows 2008	CVE_2020_0787:权限提升	User6	MAIL
域控制器：DC	10	Windows 2008	—	Admin	ALL

编号：对应状态	动作编号：有效动作（目标）
1: 1L	al：漏洞利用
8: 1L-2L-3L-5L	CVE_2019_0708(PC1)
15: 1L-3L-5L
2: 1L-2L	a2：漏洞利用
9: 1L-2L-4L-5U	CVE_2020_0796(PC2)
16: 1L-3L-4L
3: 1L-2L-5U	a3：漏洞利用
10: 1L-2L-4L-5L	CVE_2017_0143(PC3)
17: 1L-3L-4L-5L
4: 1L-2L-3L-4L-5U
11: 1L-2L-3L	a4:漏洞利用CVE-2017-7269(DB)
18: 1L-4L
5: 1L-2L-3L-5U
12: 1L-2L-3L-4L	a5:凭据利用User1(DB)
19:G
6: 1L-2L-5L	a6：漏洞利用
13: 1L-2L-4L	CVE_2016_3225(DB)
7: 1L-2L-3L-4L-5L	a7：凭据利用Admin(DC)
14: 1L-3L

序号	攻击路径	累积奖励
1	1→14→15→G1	978.1
2	1→14→16→17→G2	82.4
3	1→2→11→12→5→7→G3	1 874.5
4	1→2→11→12→7→G4	1 848.6
5	1→2→11→8→G5	1 097.7
6	1→2→3→4→5→7→G6	2 934.5
7	1→2→3→4→8→G7	2 764.5
8	1→2→3→6→G8	3 804.7
9	1→2→3→9→5→7→G9	3 075.2
10	1→2→3→9→10→G10	3 552.5
11	1→2→13→12→5→7→G11	1 874.1
12	1→2→13→12→7→G12	1 848.2
13	1→18→16→17→G13	16.1

Automated Windows domain penetration method based on reinforcement learning

RichHTML

PDF下载

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 11

References 40

Related Articles 6

Metrics

Recommended 0

[1]	Xiaoyan QIN, Yuhan LIU, Yunlong XU, Bin LI. Function approximation method based on weights gradient descent in reinforcement learning [J]. Chinese Journal of Network and Information Security, 2023, 9(4): 16-28.
[2]	Tian XIAO, Zhihao JIANG, Peng TANG, Zheng HUANG, Jie GUO, Weidong QIU. High-performance directional fuzzing scheme based on deep reinforcement learning [J]. Chinese Journal of Network and Information Security, 2023, 9(2): 132-142.
[3]	Wenfu LIU, Jianmin PANG, Xin ZHOU, Nan LI, Feng YUE. Research on network risk assessment based on attack graph of expected benefits-rate [J]. Chinese Journal of Network and Information Security, 2022, 8(4): 87-97.
[4]	Cheng SUN, Hao HU, Yingjie YANG, Hongqi ZHANG. Prediction method of 0day attack path based on cyber defense knowledge graph [J]. Chinese Journal of Network and Information Security, 2022, 8(1): 151-166.
[5]	Tangwei1 XU,Hailu ZHANG,Chuhuan LIU,Liang XIAO,Zhenmin ZHU. Reinforcement learning based group key agreement scheme with reduced latency for VANET [J]. Chinese Journal of Network and Information Security, 2020, 6(5): 119-125.
[6]	Yuyang ZHOU, Guang CHENG, Chunsheng GUO. Risk assessment method for network attack surface based on Bayesian attack graph [J]. Chinese Journal of Network and Information Security, 2018, 4(6): 11-22.