通信学报 ›› 2018, Vol. 39 ›› Issue (8): 56-68.doi: 10.11959/j.issn.1000-436x.2018145

• 论文Ⅰ:人工智能与网络安全 • 上一篇    下一篇

基于不完全信息随机博弈与Q-learning的防御决策方法

张红旗1,2,杨峻楠1,2,张传富1,2   

  1. 1 信息工程大学三院,河南 郑州 450001
    2 河南省信息安全重点实验室,河南 郑州 450001
  • 修回日期:2018-07-10 出版日期:2018-08-01 发布日期:2018-09-13
  • 作者简介:张红旗(1962-),男,河北遵化人,博士,信息工程大学教授、博士生导师,主要研究方向为网络安全、风险评估、等级保护和信息安全管理等。|杨峻楠(1993-),男,河北藁城人,信息工程大学硕士生,主要研究方向为网络信息安全、博弈论和强化学习等。|张传富(1973-),男,山东莱芜人,博士后,信息工程大学副教授,主要研究方向为计算机建模与仿真技术等。
  • 基金资助:
    国家高技术研究发展计划(“863”计划)基金资助项目(2014AA7116082);国家高技术研究发展计划(“863”计划)基金资助项目(2015AA7116040)

Defense decision-making method based on incomplete information stochastic game and Q-learning

Hongqi ZHANG1,2,Junnan YANG1,2,Chuanfu ZHANG1,2   

  1. 1 The Third Institute,Information Engineering University,Zhengzhou 450001,China
    2 Henan Province Key Laboratory of Information Security,Zhengzhou 450001,China
  • Revised:2018-07-10 Online:2018-08-01 Published:2018-09-13
  • Supported by:
    The National High Technology Researchand Development Program of China(863Program)(2014AA7116082);The National High Technology Researchand Development Program of China(863Program)(2015AA7116040)

摘要:

针对现有随机博弈大多以完全信息假设为前提,且与网络攻防实际不符的问题,将防御者对攻击者收益的不确定性转化为对攻击者类型的不确定性,构建不完全信息随机博弈模型。针对网络状态转移概率难以确定,导致无法确定求解均衡所需参数的问题,将Q-learning引入随机博弈中,使防御者在攻防对抗中通过学习得到的相关参数求解贝叶斯纳什均衡。在此基础上,设计了能够在线学习的防御决策算法。仿真实验验证了所提方法的有效性。

关键词: 网络攻防, 随机博弈, Q-learning, 贝叶斯纳什均衡, 防御决策

Abstract:

Most of the existing stochastic games are based on the assumption of complete information,which are not consistent with the fact of network attack and defense.Aiming at this problem,the uncertainty of the attacker’s revenue was transformed to the uncertainty of the attacker type,and then a stochastic game model with incomplete information was constructed.The probability of network state transition is difficult to determine,which makes it impossible to determine the parameter needed to solve the equilibrium.Aiming at this problem,the Q-learning was introduced into stochastic game,which allowed defender to get the relevant parameter by learning in network attack and defense and to solve Bayesian Nash equilibrium.Based on the above,a defense decision algorithm that could learn online was designed.The simulation experiment proves the effectiveness of the proposed method.

Key words: network attack and defense, stochastic game, Q-learning, Bayesian Nash equilibrium, defense strategy

中图分类号: 

No Suggested Reading articles found!