基于不完全信息随机博弈与Q-learning的防御决策方法

doi:10.11959/j.issn.1000-436x.2018145

通信学报 ›› 2018, Vol. 39 ›› Issue (8): 56-68.doi: 10.11959/j.issn.1000-436x.2018145

• 论文Ⅰ：人工智能与网络安全 • 上一篇下一篇

基于不完全信息随机博弈与Q-learning的防御决策方法

张红旗^1,²,杨峻楠^1,²,张传富^1,²

¹ 信息工程大学三院，河南郑州 450001
² 河南省信息安全重点实验室，河南郑州 450001

修回日期:2018-07-10 出版日期:2018-08-01 发布日期:2018-09-13
作者简介:张红旗（1962-），男，河北遵化人，博士，信息工程大学教授、博士生导师，主要研究方向为网络安全、风险评估、等级保护和信息安全管理等。|杨峻楠（1993-），男，河北藁城人，信息工程大学硕士生，主要研究方向为网络信息安全、博弈论和强化学习等。|张传富（1973-），男，山东莱芜人，博士后，信息工程大学副教授，主要研究方向为计算机建模与仿真技术等。
基金资助:
国家高技术研究发展计划（“863”计划）基金资助项目(2014AA7116082);国家高技术研究发展计划（“863”计划）基金资助项目(2015AA7116040)

Defense decision-making method based on incomplete information stochastic game and Q-learning

Hongqi ZHANG^1,²,Junnan YANG^1,²,Chuanfu ZHANG^1,²

¹ The Third Institute,Information Engineering University,Zhengzhou 450001,China
² Henan Province Key Laboratory of Information Security,Zhengzhou 450001,China

Revised:2018-07-10 Online:2018-08-01 Published:2018-09-13
Supported by:
The National High Technology Researchand Development Program of China(863Program)(2014AA7116082);The National High Technology Researchand Development Program of China(863Program)(2015AA7116040)

摘要/Abstract

摘要：

针对现有随机博弈大多以完全信息假设为前提，且与网络攻防实际不符的问题，将防御者对攻击者收益的不确定性转化为对攻击者类型的不确定性，构建不完全信息随机博弈模型。针对网络状态转移概率难以确定，导致无法确定求解均衡所需参数的问题，将Q-learning引入随机博弈中，使防御者在攻防对抗中通过学习得到的相关参数求解贝叶斯纳什均衡。在此基础上，设计了能够在线学习的防御决策算法。仿真实验验证了所提方法的有效性。

关键词: 网络攻防, 随机博弈, Q-learning, 贝叶斯纳什均衡, 防御决策

Abstract:

Most of the existing stochastic games are based on the assumption of complete information,which are not consistent with the fact of network attack and defense.Aiming at this problem,the uncertainty of the attacker’s revenue was transformed to the uncertainty of the attacker type,and then a stochastic game model with incomplete information was constructed.The probability of network state transition is difficult to determine,which makes it impossible to determine the parameter needed to solve the equilibrium.Aiming at this problem,the Q-learning was introduced into stochastic game,which allowed defender to get the relevant parameter by learning in network attack and defense and to solve Bayesian Nash equilibrium.Based on the above,a defense decision algorithm that could learn online was designed.The simulation experiment proves the effectiveness of the proposed method.

Key words: network attack and defense, stochastic game, Q-learning, Bayesian Nash equilibrium, defense strategy

中图分类号:

TP393.08

张红旗,杨峻楠,张传富. 基于不完全信息随机博弈与Q-learning的防御决策方法[J]. 通信学报, 2018, 39(8): 56-68.

Hongqi ZHANG,Junnan YANG,Chuanfu ZHANG. Defense decision-making method based on incomplete information stochastic game and Q-learning[J]. Journal on Communications, 2018, 39(8): 56-68.

图/表 16

图1

表1

图2

图3

图4

表2

表3

表4

图5

表5

图6

图7

图8

图9

图10

表6

参考文献 21

[1]	HU H , ZHANG H , LIU Y ,et al. Quantitative method for network security situation based on attack prediction[J]. Security ＆ Communication Networks, 2017(4): 1-19.
[2]	HU H , LIU Y , ZHANG H ,et al. Optimal network defense strategy selection based on incomplete information evolutionary game[J]. IEEE Access, 2018,PP(99):1.
[3]	FALLAH M . A puzzle-based defense strategy against flooding attacks using game theory[J]. IEEE Transactions on Dependable ＆ Secure Computing, 2010,7(1): 5-19.
[4]	FILAR J , VRIEZE K . Competitive Markov decision processes[J]. Springer Berlin, 1996,36(4): 343-358.
[5]	姜伟, 方滨兴, 田志宏 ,等. 基于攻防随机博弈模型的防御策略选取研究[J]. 计算机研究与发展, 2010,47(10): 1714-1723.
	JIANG W , FANG B X , TIAN Z H ,et al. Research on defense strategies selection based on attack-defense stochastic game model[J]. Journal of Computer Research and Development, 2010,47(10): 1714-1723.
[6]	LYE K W , WING J M . Game strategies in network security[J]. International Journal of Information Security, 2005,4(1-2): 71-86.
[7]	WEI L , SARWAT A , SAAD W ,et al. Stochastic games for power grid protection against coordinated cyber-physical attacks[J]. IEEE Transactions on Smart Grid, 2016,PP(99):1.
[8]	ARFAOUI A , LETAIFA A B , KRIBECHE A ,et al. A stochastic game for adaptive security in constrained wireless body area networks[C]// Consumer Communications ＆ NETWORKING Conference. 2018: 1-7.
[9]	LEI C , ZHANG H Q , WAN L M ,et al. Incomplete information Markov game theoretic approach to strategy generation for moving target defense[J]. Computer Communications, 2018,116: 184-199.
[10]	LEI C , MA D H , ZHANG H Q . Optimal strategy selection for moving target defense based on Markov game[J]. IEEE Access, 2017,PP(99):1.
[11]	WATKINS C J C H , DAYAN P . Technical note:Q-learning[J]. Machine Learning, 1992,8(3-4): 279-292.
[12]	刘陶, 何炎祥, 熊琦 . 一种基于Q学习的LDoS攻击实时防御机制及其CPN实现[J]. 计算机研究与发展, 2011,48(3): 432-439.
	LIU T , HE Y X , XIONG Q . A Q-learning based real-time mitigating mechanism against LDoS attack and its modeling and simulation with CPN[J]. Journal of Computer Research and Development, 2011,48(3): 432-439.
[13]	RANDRIANSOLO A S , PYEATT L D . Q-learning:from computer network security to software security[C]// International Conference on Machine Learning and Applications. 2015: 257-262.
[14]	YAN J , HE H , ZHONG X ,et al. Q-learning-based vulnerability analysis of smart grid against sequential topology attacks[J]. IEEE Transactions on Information Forensics ＆ Security, 2017,12(1): 200-210.
[15]	HARSANYI J C , SELTEN R . A general theory of equilibrium selection in games[M]. Boston: MIT PressPress, 1988.
[16]	CORMEN T H , LEISERSON C E , RIVEST R L ,et al. Introduction to algorithms[M]. Boston: MIT PressPress, 2009.
[17]	张恒巍, 李涛 . 基于多阶段攻防信号博弈的最优主动防御[J]. 电子学报, 2017,45(2): 431-439.
	ZHANG H W , LI T . Optimal active defense based on multi-stage attack-defense signaling game[J]. Acta Electronica Sinica, 2017,45(2): 431-439.
[18]	HUNG S M , GIVIGI S N . A Q-learning approach to flocking with UAVs in a stochastic environment[J]. IEEE Transactions on Cybernetics, 2016,47(1): 186-197.
[19]	SZEPESVARI C , LITTMAN M . A unified analysis of value-function-based reinforcement-learning algorithms[J]. Neural Computation, 1999,11(8): 2017-2059.
[20]	GORDON L , LOEB M , LUCYSHYN W ,et al. 2015 CSI/FBI computer crime and security survey[C]// The 2014 Computer Security Institute. 2015: 48-64.
[21]	王震, 袁勇, 安波 ,等. 安全博弈论研究综述[J]. 指挥与控制学报, 2015,1(2): 121-149.
	WANG Z , YUAN Y , AN B ,et al. An overview of security games[J]. Journal of Command and Control, 2015,1(2): 121-149.

符号	含义	符号	含义
N	局中人集合	II-SGM	不完全信息随机博弈模型
S	博弈状态集合	π_a(s_k,θ_i)	在状态s_k 类型为θ_i的攻击者的策略
Θ	攻击者类型空间	π_d(s_k)	在状态s_k 的防御者策略
A	攻击者动作集合	σ_a(s_k,a_m,θ_i))	在状态s_k 类型为θ_i的攻击者选择a_m 的概率
D	防御者动作集合	σ_d(s_k,d)	在状态s_k 防御者选择d 的概率
π	攻防策略集合	Q_a(s_i,a,d,θ_j)	双方采取动作（a,d）后攻击者的期望累积收益
θ_i	攻击者类型	Q_d(s_i,a,d,θ_j)	双方采取动作（a,d）后防御者的期望累积收益
a_m	攻击动作	V_a(s_i,π_a(s_i,θ_j),π_d(s_i),θ_j)	双方采取策略(π_a(s_i,θ_j),π_d(s_i))后攻击者的期望累积收益
d_m	防御动作	V_d(s_i,π_a(s_i,θ_j),π_d(s_i),θ_j)	双方采取策略(π_a(s_i,θ_j),π_d(s_i))后防御者的期望累积收益
π_a	攻击者策略	p_A(s_i,θ_n)	防御者认为在状态s_i 攻击者类型为θ_n的概率
π_d	防御者策略	P_A	防御者对攻击者类型分布的概率判断
α	学习率	π^ε	ε-greedy策略
γ	折扣因子	Q^*	正确的状态-动作收益

序号	主机	CVE编号	服务
a₁	Web服务器	CVE-2015-1635	HTTP
a₂	Web服务器	CVE-2017-7269	IIS
a₃	Web服务器	CVE-2014-8517	FTP
a₄	堡垒主机	CVE-2014-3556	SMTP
a₅	文件服务器	CVE-2014-4877	FTP
a₆	数据库服务器	CVE-2013-4730	FTP
a₇	数据库服务器	CVE-2016-6662	MySQL

原子防御动作	d₁	d₂	d₃	d₄	d₅	d₆
Renew root data	√		√		√	√
Limit SYN/ICMP packets		√
Install Oracle patches	√
Reinstall Listener program	√				√
Uninstall delete Trojan		√				√
Limit access to MDSYS		√		√
Restart Database server			√	√	√
Delete suspicious account		√
Add physical resource	√			√	√	√
Repair database			√	√
Limit packets from ports	√	√	√			√

状态	描述
s₁	H₁(root),H₂(user),H₃(none),H₄(none),H₅(none)
s₂	H₁(root),H₂(user),H₃(user),H₄(none),H₅(none)
s₃	H₁(root),H₂(root),H₃(user),H₄(none),H₅(none)
s₄	H₁(root),H₂(user/root),H₃(root),H₄(none),H₅(none)
s₅	H₁(root),H₂(user/root),H₃(root),H₄(user),H₅(none)
s₆	H₁(root),H₂(user/root),H₃(root),H₄(none),H₅(user)
s₇	H₁(root),H₂(user/root),H₃(root),H₄(none/user),H₅(root)

序号	ε	α	γ
设置1	0.1	0.6	0.3
设置2	0.2	0.4	0.6
设置3	0.3	0.5	0.5

基于不完全信息随机博弈与Q-learning的防御决策方法

Defense decision-making method based on incomplete information stochastic game and Q-learning

在线阅读

PDF下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 16

参考文献 21

相关文章 4

Metrics

推荐阅读 0

方法	理论基础	模型假设	转移模型	转移概率	策略依存性	具体应用
文献[5]方法	随机博弈	完全信息	需已知	定值	已考虑	策略选取
文献[6]方法	随机博弈	完全信息	需已知	定值	已考虑	策略选取
文献[8]方法	随机博弈	完全信息	需已知	定值	已考虑	策略选取
文献[9]方法	随机博弈	不完全信息	需已知	定值	已考虑	策略选取
文献[12]方法	Q-learning	—	免模型	动态变化	未考虑	安全机制
文献[14]方法	Q-learning	—	免模型	动态变化	未考虑	脆弱性分析
本文方法	随机博弈+Q-learning	不完全信息	免模型	动态变化	已考虑	策略选取

[1]	范伟, 彭诚, 朱大立, 王雨晴. 移动边缘计算网络下基于静态贝叶斯博弈的入侵响应策略研究[J]. 通信学报, 2023, 44(2): 70-81.
[2]	胡永进,马骏,郭渊博,张晗. 基于多阶段网络欺骗博弈的主动防御研究[J]. 通信学报, 2020, 41(8): 32-42.
[3]	黄世锐,张恒巍,王晋东,窦睿彧. 基于定性微分博弈的网络安全威胁预警方法[J]. 通信学报, 2018, 39(8): 29-36.
[4]	黄健明,张恒巍,王晋东,黄世锐. 基于攻防演化博弈模型的防御策略选取方法[J]. 通信学报, 2017, 38(1): 168-176.