智能科学与技术学报 ›› 2023, Vol. 5 ›› Issue (4): 464-476.doi: 10.11959/j.issn.2096-6652.202341

• 学术论文 • 上一篇    下一篇

面向对抗条件下资源分配的在线多阶段布洛托博弈求解方法

陈少飞, 邹明我, 苏小龙, 罗俊仁, 冯俊侨   

  1. 国防科技大学智能科学学院,湖南 长沙 410073
  • 收稿日期:2023-06-30 修回日期:2023-09-08 出版日期:2023-12-15 发布日期:2023-12-15
  • 作者简介:陈少飞(1987- ),男,博士,国防科技大学智能科学学院副教授,主要研究方向为多智能体系统、机器学习和博弈。
    邹明我(1992- ),男,国防科技大学智能科学学院硕士生,主要研究方向为智能决策、博弈和组合优化。
    苏小龙(2000- ),男,国防科技大学智能科学学院硕士生,主要研究方向为强化学习。
    罗俊仁(1989- ),男,国防科技大学智能科学学院博士生,主要研究方向为不完美信息博弈和多智能体学习。
    冯俊侨(2001- ),男,国防科技大学军政基础教育学院本科生,主要研究方向为任务规划和博弈。

Online multi-stage Colonel Blotto game solving method for resource allocation under contested condition

Shaofei CHEN, Mingwo ZOU, Xiaolong SU, Junren LUO, Junqiao FENG   

  1. National University of Defense Technology, College of Intelligence Science and Technology, Changsha 410073, China
  • Received:2023-06-30 Revised:2023-09-08 Online:2023-12-15 Published:2023-12-15

摘要:

未来战场上的作战资源分配是一个存在总资源预算约束的多阶段对抗问题,具有环境高复杂性、动态不确定性、博弈强对抗性。基于布洛托博弈模型,首先把多阶段对抗场景下的资源分配问题建模为双层在线布洛托博弈,然后将原资源分配问题转化为有向无环图上的在线最短路径问题,并借鉴拉格朗日博弈对资源分配问题进行分析求解。此外,提出LagrangeBwK-Exp3-G算法以实现多阶段对抗条件下资源分配问题的高概率遗憾最小化,进一步通过数学推导获得关于时间范围T的高概率遗憾界。最后,设计一个多阶段对抗条件下的卫星通信多信道功率分配实验,从而验证LagrangeBwK-Exp3-G算法具有良好性能。

关键词: 多阶段对抗, 布洛托博弈, 资源分配, 高概率遗憾

Abstract:

The allocation of combat resources on the future battlefield is a multi-stage confrontation problem with total resource budget constraints, which is characterized by high complexity of environment, dynamic uncertainty, and strong game confrontation. Based on the Blotto game model, the research firstly modelled the resource allocation problem in the multi-stage confrontation scenario as a two-level online Blotto game, then transformed the original problem into an online shortest path problem on a directed acyclic graph to realize the intuitive formulation of the resource allocation problem. The resource allocation problem was analyzed and solved by referring to the Lagrange game. In addition, the LagrangeBwK-Exp3-G algorithm was proposed to minimize the high probability regret of the resource allocation problem under the condition of multi-stage antagonism, and the high-probability regret bound of the algorithm on the time range T was obtained by mathematical derivation. Finally, a multi-channel power allocation experiment of satellite communication under the condition of multi-stage confrontation was designed to verify the good performance of LagrangeBwK-Exp3-G algorithm.

Key words: multi-stage adversarial, Blotto game, resource allocation, high probability regret

中图分类号: 

No Suggested Reading articles found!