面向对抗条件下资源分配的在线多阶段布洛托博弈求解方法

doi:10.11959/j.issn.2096-6652.202341

摘要/Abstract

摘要：

未来战场上的作战资源分配是一个存在总资源预算约束的多阶段对抗问题，具有环境高复杂性、动态不确定性、博弈强对抗性。基于布洛托博弈模型，首先把多阶段对抗场景下的资源分配问题建模为双层在线布洛托博弈，然后将原资源分配问题转化为有向无环图上的在线最短路径问题，并借鉴拉格朗日博弈对资源分配问题进行分析求解。此外，提出LagrangeBwK-Exp3-G算法以实现多阶段对抗条件下资源分配问题的高概率遗憾最小化，进一步通过数学推导获得关于时间范围T的高概率遗憾界。最后，设计一个多阶段对抗条件下的卫星通信多信道功率分配实验，从而验证LagrangeBwK-Exp3-G算法具有良好性能。

关键词: 多阶段对抗, 布洛托博弈, 资源分配, 高概率遗憾

Abstract:

The allocation of combat resources on the future battlefield is a multi-stage confrontation problem with total resource budget constraints, which is characterized by high complexity of environment, dynamic uncertainty, and strong game confrontation. Based on the Blotto game model, the research firstly modelled the resource allocation problem in the multi-stage confrontation scenario as a two-level online Blotto game, then transformed the original problem into an online shortest path problem on a directed acyclic graph to realize the intuitive formulation of the resource allocation problem. The resource allocation problem was analyzed and solved by referring to the Lagrange game. In addition, the LagrangeBwK-Exp3-G algorithm was proposed to minimize the high probability regret of the resource allocation problem under the condition of multi-stage antagonism, and the high-probability regret bound of the algorithm on the time range T was obtained by mathematical derivation. Finally, a multi-channel power allocation experiment of satellite communication under the condition of multi-stage confrontation was designed to verify the good performance of LagrangeBwK-Exp3-G algorithm.

Key words: multi-stage adversarial, Blotto game, resource allocation, high probability regret

中图分类号:

TP39

陈少飞, 邹明我, 苏小龙, 等. 面向对抗条件下资源分配的在线多阶段布洛托博弈求解方法[J]. 智能科学与技术学报, 2023, 5(4): 464-476.

Shaofei CHEN, Mingwo ZOU, Xiaolong SU, et al. Online multi-stage Colonel Blotto game solving method for resource allocation under contested condition[J]. Chinese Journal of Intelligent Science and Technology, 2023, 5(4): 464-476.

图/表 7

图1

算法1

LagrangeBwK"

Input: 资源预算B₀, T, ALG1算法, ALG2算法;

1: $f o r t = 1, . . ., T d o$

2: ALG1算法选择一支赌博臂 $a t ∈ S$ ;

3: ALG2算法选择其中一种资源 $i t ∈ R$ ;

4: 观察奖励 $r t (a t)$ 和消耗 $ω t, i (a t), ∀ i ∈ R$ ;

5: $Γ a t, i t$ 作为奖励反馈给ALG1;

6: $Γ a t, i t$ 作为消耗反馈给ALG2;

7: $e n d f o r$

算法1

算法2

Exp3-G"

Input: $β 2 ∈ (0,1)$ , $T$ , DAG；

1:初始化，对于DAG中的每一条边 $e ∈ E$ ， $ω e 0 = 1$ ， $Q = {0}$ ， $u 0 = s$ ， k = 0, d表示DAG的终止点;

2: $f o r t = 1,2, ⋅ ⋅ ⋅, T d o$

3: $f o r k ≤ n d o$

4: 以概率 $q e u k, u k + 1 t = (1 - γ) ω e u k, u k + 1 t H t (u k + 1, d) H t (u k, d) + γ {p ∈ P : e ∈ E} P$ 从 $u k$ 的继承点集合 $C (u k)$ 中采样一个继承点 $u k + 1$ ;

5: 增加顶点 $u k + 1$ 到集合Q中;

6: $e n d f o r$

7: 连接集合Q中所有的顶点得到采样路径 $p t ∈ P$ ，即分配策略;

8: 和对手对抗产生增益 $g e t$ 。

9: 计算增益估计 $g ˜ e t = g e t + β 2 q e t, i f e ∈ p t β 2 q e t, o t h e r w i s e;$

10: 更新DAG中各个边的权重: $ω e t + 1 = ω e t e η g ˜ e t$ ;

11: $e n d f o r$

算法2

算法3

Hedge"

Input: $β 1 ∈ (0,1), T$ ;

1:初始化: 对于任意 $i ∈ A$ , $ω i t = 1 A$ , A表示动作集;

2: $f o r t = 1, ⋅ ⋅ ⋅, T d o$

3: 以概率 $p i t = ω i t ∑ i ∈ A ω i t$ 采样一个动作 $i t ∈ A$ ;

4: 获得损失 $l i t, ∀ i ∈ A$ ;

5: 更新动作权重 $ω i t + 1 = ω i t ⋅ β 1 l i t, ∀ i ∈ A$ ;

6: $e n d f o r$

算法3

算法4

LagrangeBwK-Exp3-G"

Input: $B, T, n, m = c B T$ , ALG1, ALG2;

1: $f o r t = 1, ⋅ ⋅ ⋅, T d o$

2: 对手选择分配策略;

3: 利用ALG1算法采样一条路径 $p t ∈ S$ ;

4: $i f x (τ) ≤ m$

5: 利用ALG1算法采样一条路径 $p t ∈ S ˜$ ;

6: 算法终止;

7: $e n d i f$

8: 计算 $Γ r e s o u r c e t$ 和 $Γ t i m e t$ ;

9: 利用ALG2选择 $Γ r e s o u r c e t$ 或者 $Γ t i m e t$ ;

10: $i f$ ALG2选择 $Γ r e s o u r c e t$

11: 将 $Γ r e s o u r c e t$ 作为奖励传递给ALG1;

12: $i f$ ALG2选择 $Γ t i m e t$

13: 将 $Γ t i m e t$ 作为奖励传递给ALG1;

14: $e n d i f$

15: 使用ALG1更新DAG中边的权重;

16: 将 $Γ r e s o u r c e t$ 和 $Γ t i m e t$ 作为损失传递给ALG2更新二者的权重;

17: $e n d i f$

算法4

图2

图3

参考文献 34

1	郭行, 符文星, 闫杰. 浅析美军马赛克战作战概念及启示[J]. 无人系统技术, 2020, 3(6): 92-106.
	GUO H, FU W X, YAN J. Analysis and inspiration of the U. S. force's concept of mosaic warfare[J]. Unmanned Systems Technology, 2020, 3(6): 92-106.
2	吴明曦. 智能化战争: AI军事畅想[M]. 北京: 国防工业出版社, 2020.
	WU M X. Intelligent warfare: the imagination of the AI military[M]. Beijing: National Defense Industry Press, 2020.
3	GRANA J, LAMB J, O'DONOUGHUE N A. Findings on mosaic warfare from a colonel blotto game [M]. Santa Monica: RAND Corporation, 2021.
4	BOREL E. La théorie du jeu et les équations intégralesa noyau symétrique[J]. Comptes Rendus De l'AcadéMie Des Sciences, 1921, 173: 1304-1308.
5	KOVENOCK D, ROBERSON B. Generalizations of the general lotto and colonel blotto games[J]. Economic Theory, 2021, 71(3): 997-1032.
6	ZOU M W, CHEN S F, LUO J R, et al. An evolutionary learning approach for anti-jamming game in cognitive radio confrontation[C]//Proceedings of 2022 IEEE International Conference on Systems, Man, and Cybernetics. Piscataway: IEEE Press, 2022: 3210-3215.
7	ZHANG L, WANG Y, MIN M H, et al. Privacy-aware laser wireless power transfer for aerial multi-access edge computing: a colonel blotto game approach[J]. IEEE Internet of Things Journal, 2023, 10(7): 5923-5939.
8	ZHOU X B, DONG X D, ZHAO L P, et al. Learning-driven cloud resource provision policy for content providers with competitor[J]. IEEE Transactions on Cloud Computing, 2022, 10(3): 1913-1924.
9	CHEN A K, FERGUSON B L, SHISHIKA D, et al. Path defense in dynamic defender-attacker blotto games (dDAB) with limited information[C]//Proceedings of 2023 American Control Conference (ACC). Piscataway: IEEE Press, 2023: 447-453.
10	SHISHIKA D, GUAN Y, DOROTHY M, et al. Dynamic defender-attacker blotto game[C]//Proceedings of 2022 American Control Conference (ACC). Piscataway: IEEE Press, 2022: 4422-4428.
11	KHARLAMOV V V. On asymptotic strategies in the stochastic colonel blotto game[J]. Theory of Probability & Its Applications, 2022, 67(2): 318-326.
12	ANBARCI N, CINGIZ K, ISMAIL M S. Proportional resource allocation in dynamic n-player Blotto games[J]. Mathematical Social Sciences, 2023, 125: 94-100.
13	NEU G. Explore no more: improved high-probability regret bounds for non-stochastic bandits[J]. Advances in Neural Information Processing Systems, 2015, 2015-January: 3168-3176.
14	FREUND Y, SCHAPIRE R E. A decision-theoretic generalization of on-line learning and an application to boosting[J]. Journal of Computer and System Sciences, 1997, 55(1): 119-139.
15	DANI V, KAKADE S M, HAYES T. The price of bandit information for online optimization[C]//Proceedings of the 20th International Conference on Neural Information Processing Systems. New York: ACM, 2007: 345-352.
16	PANIGRAHY N K, BASU P, NAIN P, et al. Resource allocation in one-dimensional distributed service networks with applications[J]. Performance Evaluation, 2020, 142: 102110.
17	ABDALLAH M. Effects of behavioral decision-making in game-theoretic frameworks for security resource allocation in networked systems [D]. West Lafayette: Purdue University Graduate School, 2022.
18	VU D Q, LOISEAU P. Colonel blotto games with favoritism: competitions with pre-allocations and asymmetric effectiveness[C]//Proceedings of the 22nd ACM Conference on Economics and Computation. New York: ACM, 2021: 862-863.
19	张骁雄, 葛冰峰, 谭跃进. 军事攻防中的多属性资源分配对策模型[J]. 国防科技大学学报, 2018, 40(5): 153-160.
	ZHANG X X, GE B F, TAN Y J. Multi-attribute game theoretic model for resource allocation in military attack-defense application[J]. Journal of National University of Defense Technology, 2018, 40(5): 153-160.
20	刘冰雁, 叶雄兵, 周赤非, 等. 基于改进DQN的复合模式在轨服务资源分配[J]. 航空学报, 2020, 41(5): 256-264.
	LIU B Y, YE X B, ZHOU C F, et al. Allocation of composite mode on-orbit service resource based on improved DQN[J]. Acta Aeronautica et Astronautica Sinica, 2020, 41(5): 256-264.
21	孙宇祥, 彭益辉, 李斌, 等. 智能博弈综述: 游戏AI对作战推演的启示[J]. 智能科学与技术学报, 2022, 4(2): 157-173.
	SUN Y X, PENG Y H, LI B, et al. Overview of intelligent game: enlightenment of game AI to combat deduction[J]. Chinese Journal of Intelligent Science and Technology, 2022, 4(2): 157-173.
22	李宪港, 李强. 典型智能博弈系统技术分析及指控系统智能化发展展望[J]. 智能科学与技术学报, 2020, 2(1): 36-42.
	LI X G, LI Q. Technical analysis of typical intelligent game system and development prospect of intelligent command and control system[J]. Chinese Journal of Intelligent Science and Technology, 2020, 2(1): 36-42.
23	ROBBINS H. Some aspects of the sequential design of experiments[J]. Bulletin of the American Mathematical Society, 1952, 58(5): 527-535.
24	GY?RGY A, LINDER T, LUGOSI G, et al. The on-line shortest path problem under partial monitoring[J]. Journal of Machine Learning Research, 2007, 8(10): 2369-2403.
25	LAI T L, ROBBINS H, et al. Asymptotically efficient adaptive allocation rules[J]. Advances in Applied Mathematics, 1985, 6(1): 4-22.
26	AUER P, CESA-BIANCHI N, FISCHER P. Finite-time analysis of the multiarmed bandit problem[J]. Machine Learning, 2002, 47(2): 235-256.
27	AUER P, CESA-BIANCHI N, FREUND Y, et al. The nonstochastic multiarmed bandit problem[J]. SIAM Journal on Computing, 2002, 32(1): 48-77.[LinkOut]
28	IMMORLICA N, SANKARARAMAN K A, SCHAPIRE R, et al. Adversarial bandits with knapsacks[C]//Proceedings of 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS). Piscataway: IEEE Press, 2019: 202-219.
29	BADANIDIYURU A, KLEINBERG R, SLIVKINS A. Bandits with knapsacks[C]//Proceedings of 2013 IEEE 54th Annual Symposium on Foundations of Computer Science. Piscataway: IEEE Press, 2013: 207-216.
30	AGRAWAL S, DEVANUR N R. Linear contextual bandits with knapsacks[EB]. arXiv preprint, 2015, arXiv: 1507.06738.
31	LI X, SUN C, YE Y. The symmetry between arms and knapsacks: a primal-dual approach for bandits with knapsacks[EB]. arXiv preprint, 2021, arXiv: 2102.06385.
32	LEON V, ETESAMI S R. Bandit learning for dynamic colonel blotto game with a budget constraint[C]//Proceedings of 2021 60th IEEE Conference on Decision and Control (CDC). Piscataway: IEEE Press, 2021: 3818-3823.
33	VU D Q, LOISEAU P, SILVA A, et al. Path planning problems with side observations—when colonels play hide-and-seek[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(2): 2252-2259.
34	CESA-BIANCHI N, LUGOSI G. Prediction, learning, and games[M]. Cambridge: Cambridge University Press, 2006.

[1]	崔文成, 王可丽, 邵虹. 基于稠密块和注意力机制的肺部病理图像异常细胞分割[J]. 智能科学与技术学报, 2023, 5(4): 525-534.
[2]	唐炜, 谭啸, 孙宇, 严家鹏, 严光锐. 基于A*与动态窗口法的物料传输平台路径规划研究[J]. 智能科学与技术学报, 2023, 5(4): 515-524.
[3]	赵茁乔, 承楠, 陈劼, 陈芳炯, 李长乐. 基于知识图谱的6G网络场景认知研究[J]. 智能科学与技术学报, 2023, 5(4): 494-504.
[4]	郑伟斌, 练国富, 张学明, 郭方. 基于主成分分析和特征图匹配的点云配准方法[J]. 智能科学与技术学报, 2023, 5(4): 543-552.
[5]	倪清桦, 郭超, 王飞跃. 平行戏剧：新时代戏剧的人机协同创作与智能管理[J]. 智能科学与技术学报, 2023, 5(4): 436-445.
[6]	王飞跃. 数字教师与平行教育：关于ChatGPT之后教学变革的探讨[J]. 智能科学与技术学报, 2023, 5(4): 454-463.
[7]	方祯祺, 李雪, 莫红. 基于改进EfficientNet的乳腺肿瘤诊断[J]. 智能科学与技术学报, 2023, 5(4): 505-514.
[8]	皮佩定, 倪清桦, 杨静, 康孟珍, 李宣昊, 杜应昆, 王飞跃. 平行夏尔希里：生态资源智能管护及其可持续发展新途径[J]. 智能科学与技术学报, 2023, 5(3): 283-292.
[9]	余唯一, 陈涛, 张军平, 单洪明. 基于深度学习的MRI脑卒中病灶分割方法综述[J]. 智能科学与技术学报, 2023, 5(3): 293-312.
[10]	项凤涛, 罗俊仁, 谷学强, 苏炯铭, 张万鹏. 群视角下的多智能体强化学习方法综述[J]. 智能科学与技术学报, 2023, 5(3): 313-329.
[11]	王明宇, 宫庆媛, 瞿晶晶, 王新. 基于机器学习的GitHub企业影响力分析与预测[J]. 智能科学与技术学报, 2023, 5(3): 330-342.
[12]	孟雪, 杨若楠, 李睿琪. 中国现实题材电视剧的海外传播效果研究——以YouTube平台为例[J]. 智能科学与技术学报, 2023, 5(3): 343-351.
[13]	黄吉婷, 郭可歆, 齐佳音. 企业数字化转型对就业规模及结构影响的实证研究[J]. 智能科学与技术学报, 2023, 5(3): 352-365.
[14]	童煜钧, 王荷清, 罗悦恒, 宁文欣, 关曼丹, 喻雯晴, 黄柯彦, 张加迅, 马占宇. 基于扩散模型数据增广的域泛化方法[J]. 智能科学与技术学报, 2023, 5(3): 380-388.
[15]	郑雨豪, 王森章. 基于扩散模型的不完整数据下细粒度城市流量推断[J]. 智能科学与技术学报, 2023, 5(3): 389-396.