智能博弈综述：游戏AI对作战推演的启示

SHEN

Y

, HAN

J P

, LI

L X

,et al.

AI in game intelligence—from multi-role game to parallel game

[J]. Chinese Journal of Intelligent Science and Technology, 2020,2(3): 205-213.

[2]

胡晓峰, 贺筱媛, 陶九阳 .

AlphaGo 的突破与兵棋推演的挑战

[J]. 科技导报, 2017,35(21): 49-60.

HU

X F

, HE

X Y

, TAO

J Y

.

AlphaGo’s breakthrough and challenges of wargaming

[J]. Science ＆ Technology Review, 2017,35(21): 49-60.

[3]

叶利民, 龚立, 刘忠 .

兵棋推演系统设计与建模研究

[J]. 计算机与数字工程, 2011,39(12): 58-61.

YE

L M

, GONG

L

, LIU

Z

.

Research and modeling of a rehearsal system of naval battle

[J]. Computer ＆ Digital Engineering, 2011,39(12): 58-61.

[4]

谭鑫

.

基于规则的计算机兵棋系统技术研究

[D]. 长沙:国防科学技术大学, 2010.

TAN

X

.

Research on rule-based computer wargame system technology

[D]. Changsha:National University of Defense Technology, 2010.

[5]

胡晓峰, 齐大伟 .

智能决策问题探讨——从游戏博弈到作战指挥,距离还有多远

[J]. 指挥与控制学报, 2020,6(4): 356-363.

HU

X F

, QI

D W

.

On problems of intelligent decision-making—how far is it from game-playing to operational command

[J]. Journal of Command and Control, 2020,6(4): 356-363.

[6]

YE

D H

, CHEN

G B

, ZHAO

P L

,et al.

Supervised learning achieves human-level performance in MOBA games:a case study of honor of kings

[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020: 1-11.

[本文引用: 5]

[7]

FU

H T

, TANG

H Y

, HAO

J Y

,et al.

Deep multi-agent reinforcement learning with discrete-continuous hybrid action spaces

[C]// Proceedings of the 28th International Joint Conference on Artificial Intelligence. California:International Joint Conferences on Artificial Intelligence Organization, 2019.

[8]

WANG

X J

, SONG

J X

, QI

P H

,et al.

SCC:an efficient deep reinforcement learning agent mastering the game of StarCraft II

[J]. arXiv preprint,2020,arXiv:2012.13169.

[9]

周超, 胡晓峰, 郑书奎 ,等.

战略战役兵棋演习系统兵力聚合问题研究

[J]. 指挥与控制学报, 2017,3(1): 19-26.

ZHOU

C

, HU

X F

, ZHENG

S K

,et al.

Force integration in strategic and operational war-game maneuver system

[J]. Journal of Command and Control, 2017,3(1): 19-26.

[10]

黄凯奇, 兴军亮, 张俊格 ,等.

人机对抗智能技术

[J]. 中国科学:信息科学, 2020,50(4): 540-550.

HUANG

K Q

, XING

J L

, ZHANG

J G

,et al.

Intelligent technologies of human-computer gaming

[J]. Scientia Sinica (Informationis), 2020,50(4): 540-550.

[11]

LIU

X

, ZHAO

M J

, DAI

S

,et al.

Tactical intention recognition in wargame

[C]// Proceedings of 2021 IEEE 6th International Conference on Computer and Communication Systems. Piscataway:IEEE Press, 2021: 429-434.

[12]

SUN

Y X

, YUAN

B

, ZHANG

T

,et al.

Research and implementation of intelligent decision based on a priori knowledge and DQN algorithms in wargame environment

[J]. Electronics, 2020,9(10): 1668.

[13]

陈希亮, 李清伟, 孙彧 .

基于博弈对抗的空战智能决策关键技术

[J]. 指挥信息系统与技术, 2021,12(2): 1-6.

CHEN

X L

, LI

Q W

, SUN

Y

.

Key technologies for air combat intelligent decision based on game confrontation

[J]. Command Information System and Technology, 2021,12(2): 1-6.

[14]

孙彧, 李清伟, 徐志雄 ,等.

基于多智能体深度强化学习的空战博弈对抗策略训练模型

[J]. 指挥信息系统与技术, 2021,12(2): 16-20.

SUN

Y

, LI

Q W

, XU

Z X

,et al.

Game confrontation strategy training model for air combat based on multi-agent deep reinforcement learning

[J]. Command Information System and Technology, 2021,12(2): 16-20.

[15]

瞿崇晓, 高翔, 夏少杰 ,等.

一种基于深度强化学习的无监督智能作战推演系统:CN109636699A

[P]. 2019.

QU

C X

, GAO

X

, XIA

S J

,et al.

Unsupervised intelligent combat deduction system based on deep reinforcement learning:CN109636699A

[P]. 2019.

[16]

张振, 黄炎焱, 张永亮 ,等.

基于近端策略优化的作战实体博弈对抗算法

[J]. 南京理工大学学报, 2021,45(1): 77-83.

ZHANG

Z

, HUANG

Y Y

, ZHANG

Y L

,et al.

Battle entity confrontation algorithm based on proximal policy optimization

[J]. Journal of Nanjing University of Science and Technology, 2021,45(1): 77-83.

[17]

李琛, 黄炎焱, 张永亮 ,等.

Actor-Critic 框架下的多智能体决策方法及其在兵棋上的应用

[J]. 系统工程与电子技术, 2021,43(3): 755-762.

LI

C

, HUANG

Y Y

, ZHANG

Y L

,et al.

Multi-agent decision-making method based on Actor-Critic framework and its application in wargame

[J]. Systems Engineering and Electronics, 2021,43(3): 755-762.

[18]

程恺, 陈刚, 余晓晗 ,等.

知识牵引与数据驱动的兵棋AI设计及关键技术

[J]. 系统工程与电子技术, 2021,43(10): 2911-2917.

CHENG

K

, CHEN

G

, YU

X H

,et al.

Knowledge traction and data-driven wargame AI design and key technologies

[J]. Systems Engineering and Electronics, 2021,43(10): 2911-2917.

[19]

张可, 郝文宁, 余晓晗 ,等.

基于遗传模糊系统的兵棋推演关键点推理方法

[J]. 系统工程与电子技术, 2020,42(10): 2303-2311.

ZHANG

K

, HAO

W M

, YU

X H

,et al.

Wargame key point reasoning method based on genetic fuzzy system

[J]. Systems Engineering and Electronics, 2020,42(10): 2303-2311.

[20]

李航, 刘代金, 刘禹 .

军事智能博弈对抗系统设计框架研究

[J]. 火力与指挥控制, 2020,45(9): 116-121.

LI

H

, LIU

D J

, LIU

Y

.

Architecture design research of military intelligent wargame system

[J]. Fire Control ＆ Command Control, 2020,45(9): 116-121.

[21]

施伟, 冯旸赫, 程光权 ,等.

基于深度强化学习的多机协同空战方法研究

[J]. 自动化学报, 2021,47(7): 1610-1623.

SHI

W

, FENG

Y H

, CHENG

G Q

,et al.

Research on multi-aircraft cooperative air combat method based on deep reinforcement learning

[J]. Acta Automatica Sinica, 2021,47(7): 1610-1623.

[22]

徐佳乐, 张海东, 赵东海 ,等.

基于卷积神经网络的陆战兵棋战术机动策略学习

[J]. 系统仿真学报, 2021:已录用.

XU

J L

, ZHANG

H D

, ZHAO

D H

,et al.

Tactical maneuver strategy learning of wargame based on convolutional neural network

[J]. Journal of System Simulation, 2021:acceped.

[23]

WANG

H N

, LIU

N

, ZHANG

Y Y

,et al.

Deep reinforcement learning:a survey

[J]. Frontiers of Information Technology ＆ Electronic Engineering, 2020,21(12): 1726-1744.

[24]

MNIH

V

, KAVUKCUOGLU

K

, SILVER

D

,et al.

Human-level control through deep reinforcement learning

[J]. Nature, 2015,518(7540): 529-533.

[25]

SILVER

D

, HUANG

A

, MADDISON

C J

,et al.

Mastering the game of Go with deep neural networks and tree search

[J]. Nature, 2016,529(7587): 484-489.

[本文引用: 4]

[26]

SILVER

D

, SCHRITTWIESER

J

, SIMONYAN

K

,et al.

Mastering the game of Go without human knowledge

[J]. Nature, 2017,550(7676): 354-359.

[27]

VINYALS

O

, BABUSCHKIN

I

, CZARNECKI

W M

,et al.

Grandmaster level in StarCraft II using multi-agent reinforcement learning

[J]. Nature, 2019,575(7782): 350-354.

[28]

BERNER

C

, BROCKMAN

G

, CHAN

B

,et al.

Dota 2 with large scale deep reinforcement learning

[J]. arXiv preprint,2019,arXiv:1912.06680.

[29]

BROWN

N

, SANDHOLM

T

.

Superhuman AI for multiplayer poker

[J]. Science, 2019,365(6456): 885-890.

[30]

SCHRITTWIESER

J

, ANTONOGLOU

I

, HUBERT

T

,et al.

Mastering Atari,Go,chess and shogi by planning with a learned model

[J]. Nature, 2020,588(7839): 604-609.

[31]

PRICE

M

.

What impact do VR controllers have on the traditional strategy game genre

[D]. Huddersfield:University of Huddersfield, 2019.

[32]

DAVID

A S

, JOHNSON

M

.

Reinforcing deterrence on NATO’s eastern flank:wargaming the defense of the baltics

[R]. 2016.

[33]

CANNON

C T

, GOERICKE

S

.

Using convolution neural networks to develop robust combat behaviors through reinforcement learning

[D]. CA:Naval Postgraduate School, 2021.

[34]

缐珊珊

.

美俄人工智能军事应用发展分析

[J]. 大数据, 2020,6(4): 125-132.

XIAN

S S

.

An analysis of the military application and development path of artificial intelligence in the United States and Russia

[J]. Big Data Research, 2020,6(4): 125-132.

[35]

TARRAF

D C

, GILMORE

J M

, BOSTON

S

.

An experiment in tactical wargaming with platforms enabled by artificial intelligence

[R]. 2020.

[36]

YE

D H

, LIU

Z

, SUN

M F

,et al.

Mastering complex control in MOBA games with deep reinforcement learning

[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020,34(4): 6672-6679.

[37]

BROCKMAN

G

, CHEUNG

V

, PETTERSSON

L

,et al.

OpenAI gym

[J]. arXiv preprint,2016,arXiv:1606.01540.

[本文引用: 4]

[38]

ARULKUMARAN

K

, CULLY

A

, TOGELIUS

J

.

Alphastar:an evolutionary computation perspective

[C]// Proceedings of the Genetic and Evolutionary Computation Conference Companion.[S.l.:s.n.], 2019: 314-315.

[39]

YE

D H

, CHEN

G B

, ZHANG

W

,et al.

Towards playing full MOBA games with deep reinforcement learning

[J]. arXiv preprint,2020,arXiv:2011.12692.

[本文引用: 4]

[40]

MNIH

V

, KAVUKCUOGLU

K

, SILVER

D

,et al.

Playing atari with deep reinforcement learning

[J]. arXiv preprint,2013,arXiv:1312.5602.

[41]

张凯峰, 俞扬 .

基于逆强化学习的示教学习方法综述

[J]. 计算机研究与发展, 2019,56(2): 254-261.

ZHANG

K F

, YU

Y

.

Methodologies for imitation learning via inverse reinforcement learning:a review

[J]. Journal of Computer Research and Development, 2019,56(2): 254-261.

[42]

曹雷

.

基于深度强化学习的智能博弈对抗关键技术

[J]. 指挥信息系统与技术, 2019,10(5): 1-7.

CAO

L

.

Key technologies of intelligent game confrontation based on deep reinforcement learning

[J]. Command Information System and Technology, 2019,10(5): 1-7.

[43]

RISI

S

, PREUSS

M

.

Behind DeepMind’s AlphaStar AI that reached grandmaster level in StarCraft II

[J]. KI-KünstlicheIntelligenz, 2020,34(1): 85-86.

[本文引用: 5]

[44]

SILVER

D

, VENESS

J

.

Monte-Carlo planning in large POMDPs

[C]// Proceedings of the Advances in Neural Information Processing Systems 23.[S.l.:s.n.], 2010.

[45]

GOODMAN

J

, LUCAS

S

.

Does it matter how well I know what you’re thinking? Opponent modelling in an RTS game

[C]// Proceedings of 2020 IEEE Congress on Evolutionary Computation. Piscataway:IEEE Press, 2020: 1-8.

[46]

JOHANSON

M

.

Measuring the size of large no-limit poker games

[J]. arXiv preprint,2013,arXiv:1302.7008.

[47]

DUGAS

D

, NIETO

J

, SIEGWART

R

,et al.

Navrep:unsupervised representations for reinforcement learning of robot navigation in dynamic human environments

[C]// Proceedings of 2021 IEEE International Conference on Robotics and Automation. Piscataway:IEEE Press, 2021: 7829-7835.

[48]

ONTANÓN

S

, SYNNAEVE

G

, URIARTE

A

,et al.

A survey of real-time strategy game AI research and competition in StarCraft

[J]. IEEE Transactions on Computational Intelligence and AI in games, 2013,5(4): 293-311.

[49]

FENNER

S A

, ROGERS

J

.

Combinatorial game complexity:an introduction with poset games

[J]. arXiv preprint,2015,arXiv:1505.07416.

[50]

SUTTON

R S

, BARTO

A G

.

Reinforcement learning:an introduction

[J]. IEEE Transactions on Neural Networks, 2005,16(1): 285-286.

[51]

VAN

HASSELT H

, GUEZ

A

, SILVER

D

.

Deep reinforcement learning with double q-learning

[C]// Proceedings of the 30th AAAI Conference on Artificial Intelligence. Piscataway:IEEE Press, 2016.

[52]

SCHAUL

T

, QUAN

J

, ANTONOGLOU

I

,et al.

Prioritized experience replay

[J]. arXiv preprint,2015,arXiv:1511.05952.

[53]

WANG

Z Y

, SCHAUL

T

, HESSEL

M

,et al.

Dueling network architectures for deep reinforcement learning

[J]. arXiv preprint,2015,arXiv:1511.06581.

[54]

MNIH

V

, BADIA

A P

, MIRZA

M

,et al.

Asynchronous methods for deep reinforcement learning

[C]// Proceedings of the 33rd International Conference on Machine Learning.[S.l.:s.n.], 2016: 1928-1937.

[55]

刘朝阳, 穆朝絮, 孙长银 .

深度强化学习算法与应用研究现状综述

[J]. 智能科学与技术学报, 2020,2(4): 314-326.

LIU

Z Y

, MU

C X

, SUN

C Y

.

An overview on algorithms and applications of deep reinforcement learning

[J]. Chinese Journal of Intelligent Science and Technology, 2020,2(4): 314-326.

[56]

LILLICRAP

T P

, HUNT

J J

, PRITZEL

A

,et al.

Continuous control with deep reinforcement learning

[J]. arXiv preprint,2015,arXiv:1509.02971.

[57]

LOWE

R

, WU

Y

, TAMAR

A

,et al.

Multi-agent actor-critic for mixed cooperative-competitive environments

[C]// Proceedings of the Advances in Neural Information Processing Systems 30.[S.l.:s.n.], 2018.

[58]

SCHULMAN

J

, WOLSKI

F

, DHARIWAL

P

,et al.

Proximal policy optimization algorithms

[J]. arXiv preprint,2017,arXiv:1707.06347.

[59]

HAARNOJA

T

, ZHOU

A

, ABBEEL

P

,et al.

Soft actor-critic:off-policy maximum entropy deep reinforcement learning with a stochastic actor

[C]// Proceedings of the International Conference on Machine Learning.[S.l.:s.n.], 2018: 1861-1870.

[60]

FUJIMOTO

S

, VAN

HOOF H

, MEGER

D

.

Addressing function approximation error in actor-critic methods

[C]// Proceedings of the International Conference on Machine Learning.[S.l.:s.n.], 2018: 1587-1596.

[61]

FLORENSA

C

, DUAN

Y

, ABBEEL

P

.

Stochastic neural networks for hierarchical reinforcement learning

[J]. arXiv preprint,2017,arXiv:1704.03012.

[62]

RAFATI

J

, NOELLE

D C

.

Learning representations in model-free hierarchical reinforcement learning

[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019,33: 10009-10010.

[63]

PANG

Z J

, LIU

R Z

, MENG

Z Y

,et al.

On reinforcement learning for full-length game of StarCraft

[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019,33: 4691-4698.

[64]

LI

S Y

, WANG

R

, TANG

M X

,et al.

Hierarchical reinforcement learning with advantage-based auxiliary rewards

[J]. arXiv preprint,2019,arXiv:1910.04450.

[65]

HOCHREITER

S

, SCHMIDHUBER

J

.

Long short-term memory

[J]. Neural Computation, 1997,9(8): 1735-1780.

[66]

YAO

X

.

A review of evolutionary artificial neural networks

[J]. International Journal of Intelligent Systems, 1993,8(4): 539-567.

[67]

DING

S F

, LI

H

, SU

C Y

,et al.

Evolutionary artificial neural networks:a review

[J]. Artificial Intelligence Review, 2013,39(3): 251-260.

[68]

YAO

X

, LIU

Y

.

A new evolutionary system for evolving artificial neural networks

[J]. IEEE Transactions on Neural Networks, 1997,8(3): 694-713.

[69]

SALIMANS

T

, HO

J

, CHEN

X

,et al.

Evolution strategies as a scalable alternative to reinforcement learning

[J]. arXiv preprint,2017,arXiv:1703.03864.

[70]

SUCH

F P

, MADHAVAN

V

, CONTI

E

,et al.

Deep neuroevolution:genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning

[J]. arXiv preprint,2017,arXiv:1712.06567.

[71]

栾丽华, 吉根林 .

决策树分类技术研究

[J]. 计算机工程, 2004,30(9): 94-96,105.

LUAN

L H

, JI

G L

.

The study on decision tree classification techniques

[J]. Computer Engineering, 2004,30(9): 94-96,105.

[72]

鲁大剑

.

面向作战推演的博弈与决策模型及应用研究

[D]. 南京:南京理工大学, 2013.

LU

D J

.

Research on game and decision model for operational deduction and its application

[D]. Nanjing:Nanjing University of technology, 2013.

[73]

尹星, 孙鹏, 韩冰 .

基于决策树的作战实体行为规则建模

[J]. 指挥控制与仿真, 2020,42(1): 15-19.

YIN

X

, SUN

P

, HAN

B

.

Modeling of behavior rules of combat entities based on decision tree

[J]. Command Control ＆ Simulation, 2020,42(1): 15-19.

[74]

ZHOU

Z H

, FENG

J

.

Deep forest

[J]. National Science Review, 2019,6(1): 74-86.

[75]

董浩洋, 张永亮, 齐宁 ,等.

基于综合势能的作战行动序列生成方法研究

[J]. 军事运筹与系统工程, 2020,34(3): 11-18.

DONG

H Y

, ZHANG

Y L

, QI

N

,et al.

Research on the method of generating operational sequence based on comprehensive potential energy

[J]. Military Operations Research and Systems Engineering, 2020,34(3): 11-18.

[76]

BREIMAN

L

.

Random forests

[J]. Machine learning, 2001,45(1): 5-32.

[77]

DE

MESENTIER SILVA F

, TOGELIUS

J

, LANTZ

F

,et al.

Generating novice heuristics for post-flop poker

[C]// Proceedings of 2018 IEEE Conference on Computational Intelligence and Games. Piscataway:IEEE Press, 2018: 1-8.

[78]

周献中, 郭庆军, 鞠恒荣 .

基于人件服务的C⁴ISR服务视点扩展

[J]. 指挥信息系统与技术, 2016,7(5): 1-9.

ZHOU

X Z

, GUO

Q J

, JU

H R

.

Extended C⁴ISR service viewpoint based on humanware service

[J]. Command Information System and Technology, 2016,7(5): 1-9.

[79]

朱咸军, 周献中, 王友发 ,等.

面向新型决策系统的人件模型研究

[J]. 中国科技论坛, 2016(6): 121-127.

ZHU

X J

, ZHOU

X Z

, WANG

Y F

,et al.

Research on humanware model of novel decision system-oriented

[J]. Forum on Science and Technology in China, 2016(6): 121-127.

[80]

LUCAS Simon, 沈甜雨, 王晓, ,等.

基于统计前向规划算法的游戏通用人工智能

[J]. 智能科学与技术学报, 2019,1(3): 219-227.

SIMON

L

, SHEN

T Y

, WANG

X

,et al.

General game AI with statistical forward planning algorithms

[J]. Chinese Journal of Intelligent Science and Technology, 2019,1(3): 219-227.

[81]

SHAO

K

, ZHU

Y H

, ZHAO

D B

.

StarCraft micromanagement with reinforcement learning and curriculum transfer learning

[J]. IEEE Transactions on Emerging Topics in Computational Intelligence, 2019,3(1): 73-84.

[82]

SILVER

D

, HUBERT

T

, SCHRITTWIESER

J

,et al.

A general reinforcement learning algorithm that masters chess,shogi,and Go through self-play

[J]. Science, 2018,362(6419): 1140-1144.

[83]

TANG

Z T

, ZHU

Y H

, ZHAO

D B

,et al.

Enhanced rolling horizon evolution algorithm with opponent model learning

[J]. IEEE Transactions on Games, 2020:1.

[84]

杨旭, 王锐, 张涛 .

面向无人机集群路径规划的智能优化算法综述

[J]. 控制理论与应用, 2020,37(11): 2291-2302.

YANG

X

, WANG

R

, ZHANG

T

.

Review of unmanned aerial vehicle swarm path planning based on intelligent optimization

[J]. Control Theory ＆ Applications, 2020,37(11): 2291-2302.

[85]

张菁, 何友, 彭应宁 ,等.

基于神经网络和人工势场的协同博弈路径规划

[J]. 航空学报, 2019,40(3): 322493.

ZHANG

J

, HE

Y

, PENG

Y N

,et al.

Neural network and artificial potential field based cooperative and adversarial path planning

[J]. Acta Aeronautica et Astronautica Sinica, 2019,40(3): 322493.

[86]

LEE

D

, TANG

H R

, ZHANG

J O

,et al.

Modular architecture for StarCraft II with deep reinforcement learning

[C]// Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment.[S.l.:s.n.], 2018.

[87]

MEENAKSHI

N

.

An efficient agent created in StarcCraft 2 using pysc2

[J]. Turkish Journal of Computer and Mathematics Education (TURCOMAT), 2021,12(10): 336-342.