智能科学与技术学报 ›› 2022, Vol. 4 ›› Issue (2): 157-173.doi: 10.11959/j.issn.2096-6652.202209
孙宇祥1, 彭益辉1, 李斌1, 周佳炜1, 张鑫磊1, 周献中1,2
孙宇祥(1990− ),男,南京大学工程管理学院博士生,主要研究方向为智能博弈与作战推演基金资助:
Yuxiang SUN1, Yihui PENG1, Bin LI1, Jiawei ZHOU1, Xinlei ZHANG1, Xianzhong ZHOU1,2
Supported by:
智能博弈领域已逐渐成为当前AI研究的热点之一,游戏AI领域、智能兵棋领域都在近年取得了一系列的研究突破。但是,游戏 AI 如何应用到实际的智能作战推演依然面临巨大的困难。综合分析智能博弈领域的国内外整体研究进展,详细剖析智能作战推演的主要属性需求,并结合当前最新的强化学习发展概况进行阐述。从智能博弈领域主流研究技术、相关智能决策技术、作战推演技术难点3个维度综合分析游戏AI发展为智能作战推演的可行性,最后给出未来智能作战推演的发展建议。以期为智能博弈领域的研究人员介绍一个比较清晰的发展现状并提供有价值的研究思路。
孙宇祥, 彭益辉, 李斌, 等. 智能博弈综述:游戏AI对作战推演的启示[J]. 智能科学与技术学报, 2022, 4(2): 157-173.
Yuxiang SUN, Yihui PENG, Bin LI, et al. Overview of intelligent game:enlightenment of game AI to combat deduction[J]. Chinese Journal of Intelligent Science and Technology, 2022, 4(2): 157-173.
游戏/兵棋 | 状态空间 | 动作空间 | 决策数量 | 胜利条件 | 回报值设置 | 战争迷雾 | 观察信息 | 对手建模 | 想定设计 |
《Go》 | 中等 | 中等 | 中等 | 数子法/数目法 | 简单 | 无 | 简单 | 中等 | 固定 |
《星际争霸Ⅱ》 | 复杂 | 复杂 | 较多 | 单任务目标 | 中等 | 有 | 中等 | 中等 | 变化较小 |
《Dota 2》 | 复杂 | 复杂 | 较多 | 单任务目标 | 中等 | 有 | 中等 | 中等 | 固定 |
《CMANO》 | 非常复杂 | 非常复杂 | 巨大 | 多任务目标 | 复杂 | 有 | 复杂 | 复杂 | 变化较大 |
《智戎?未来指挥官》 | 非常复杂 | 非常复杂 | 巨大 | 多任务目标/积分 | 复杂 | 有 | 复杂 | 复杂 | 变化较大 |
《王者荣耀》 | 复杂 | 复杂 | 较多 | 单任务目标 | 中等 | 有 | 中等 | 中等 | 固定 |
《战争游戏:红龙》 | 非常复杂 | 非常复杂 | 巨大 | 多任务目标 | 复杂 | 有 | 复杂 | 复杂 | 变化较大 |
《MaCA》 | 中等 | 中等 | 中等 | 积分 | 简单 | 有 | 中等 | 中等 | 固定 |
[1] | 沈宇, 韩金朋, 李灵犀 ,等. 游戏智能中的 AI:从多角色博弈到平行博弈[J]. 智能科学与技术学报, 2020,2(3): 205-213. |
SHEN Y , HAN J P , LI L X ,et al. AI in game intelligence—from multi-role game to parallel game[J]. Chinese Journal of Intelligent Science and Technology, 2020,2(3): 205-213. | |
[2] | 胡晓峰, 贺筱媛, 陶九阳 . AlphaGo 的突破与兵棋推演的挑战[J]. 科技导报, 2017,35(21): 49-60. |
HU X F , HE X Y , TAO J Y . AlphaGo’s breakthrough and challenges of wargaming[J]. Science & Technology Review, 2017,35(21): 49-60. | |
[3] | 叶利民, 龚立, 刘忠 . 兵棋推演系统设计与建模研究[J]. 计算机与数字工程, 2011,39(12): 58-61. |
YE L M , GONG L , LIU Z . Research and modeling of a rehearsal system of naval battle[J]. Computer & Digital Engineering, 2011,39(12): 58-61. | |
[4] | 谭鑫 . 基于规则的计算机兵棋系统技术研究[D]. 长沙:国防科学技术大学, 2010. |
TAN X . Research on rule-based computer wargame system technology[D]. Changsha:National University of Defense Technology, 2010. | |
[5] | 胡晓峰, 齐大伟 . 智能决策问题探讨——从游戏博弈到作战指挥,距离还有多远[J]. 指挥与控制学报, 2020,6(4): 356-363. |
HU X F , QI D W . On problems of intelligent decision-making—how far is it from game-playing to operational command[J]. Journal of Command and Control, 2020,6(4): 356-363. | |
[6] | YE D H , CHEN G B , ZHAO P L ,et al. Supervised learning achieves human-level performance in MOBA games:a case study of honor of kings[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020: 1-11. |
[7] | FU H T , TANG H Y , HAO J Y ,et al. Deep multi-agent reinforcement learning with discrete-continuous hybrid action spaces[C]// Proceedings of the 28th International Joint Conference on Artificial Intelligence. California:International Joint Conferences on Artificial Intelligence Organization, 2019. |
[8] | WANG X J , SONG J X , QI P H ,et al. SCC:an efficient deep reinforcement learning agent mastering the game of StarCraft II[J]. arXiv preprint,2020,arXiv:2012.13169. |
[9] | 周超, 胡晓峰, 郑书奎 ,等. 战略战役兵棋演习系统兵力聚合问题研究[J]. 指挥与控制学报, 2017,3(1): 19-26. |
ZHOU C , HU X F , ZHENG S K ,et al. Force integration in strategic and operational war-game maneuver system[J]. Journal of Command and Control, 2017,3(1): 19-26. | |
[10] | 黄凯奇, 兴军亮, 张俊格 ,等. 人机对抗智能技术[J]. 中国科学:信息科学, 2020,50(4): 540-550. |
HUANG K Q , XING J L , ZHANG J G ,et al. Intelligent technologies of human-computer gaming[J]. Scientia Sinica (Informationis), 2020,50(4): 540-550. | |
[11] | LIU X , ZHAO M J , DAI S ,et al. Tactical intention recognition in wargame[C]// Proceedings of 2021 IEEE 6th International Conference on Computer and Communication Systems. Piscataway:IEEE Press, 2021: 429-434. |
[12] | SUN Y X , YUAN B , ZHANG T ,et al. Research and implementation of intelligent decision based on a priori knowledge and DQN algorithms in wargame environment[J]. Electronics, 2020,9(10): 1668. |
[13] | 陈希亮, 李清伟, 孙彧 . 基于博弈对抗的空战智能决策关键技术[J]. 指挥信息系统与技术, 2021,12(2): 1-6. |
CHEN X L , LI Q W , SUN Y . Key technologies for air combat intelligent decision based on game confrontation[J]. Command Information System and Technology, 2021,12(2): 1-6. | |
[14] | 孙彧, 李清伟, 徐志雄 ,等. 基于多智能体深度强化学习的空战博弈对抗策略训练模型[J]. 指挥信息系统与技术, 2021,12(2): 16-20. |
SUN Y , LI Q W , XU Z X ,et al. Game confrontation strategy training model for air combat based on multi-agent deep reinforcement learning[J]. Command Information System and Technology, 2021,12(2): 16-20. | |
[15] | 瞿崇晓, 高翔, 夏少杰 ,等. 一种基于深度强化学习的无监督智能作战推演系统:CN109636699A[P]. 2019. |
QU C X , GAO X , XIA S J ,et al. Unsupervised intelligent combat deduction system based on deep reinforcement learning:CN109636699A[P]. 2019. | |
[16] | 张振, 黄炎焱, 张永亮 ,等. 基于近端策略优化的作战实体博弈对抗算法[J]. 南京理工大学学报, 2021,45(1): 77-83. |
ZHANG Z , HUANG Y Y , ZHANG Y L ,et al. Battle entity confrontation algorithm based on proximal policy optimization[J]. Journal of Nanjing University of Science and Technology, 2021,45(1): 77-83. | |
[17] | 李琛, 黄炎焱, 张永亮 ,等. Actor-Critic 框架下的多智能体决策方法及其在兵棋上的应用[J]. 系统工程与电子技术, 2021,43(3): 755-762. |
LI C , HUANG Y Y , ZHANG Y L ,et al. Multi-agent decision-making method based on Actor-Critic framework and its application in wargame[J]. Systems Engineering and Electronics, 2021,43(3): 755-762. | |
[18] | 程恺, 陈刚, 余晓晗 ,等. 知识牵引与数据驱动的兵棋AI设计及关键技术[J]. 系统工程与电子技术, 2021,43(10): 2911-2917. |
CHENG K , CHEN G , YU X H ,et al. Knowledge traction and data-driven wargame AI design and key technologies[J]. Systems Engineering and Electronics, 2021,43(10): 2911-2917. | |
[19] | 张可, 郝文宁, 余晓晗 ,等. 基于遗传模糊系统的兵棋推演关键点推理方法[J]. 系统工程与电子技术, 2020,42(10): 2303-2311. |
ZHANG K , HAO W M , YU X H ,et al. Wargame key point reasoning method based on genetic fuzzy system[J]. Systems Engineering and Electronics, 2020,42(10): 2303-2311. | |
[20] | 李航, 刘代金, 刘禹 . 军事智能博弈对抗系统设计框架研究[J]. 火力与指挥控制, 2020,45(9): 116-121. |
LI H , LIU D J , LIU Y . Architecture design research of military intelligent wargame system[J]. Fire Control & Command Control, 2020,45(9): 116-121. | |
[21] | 施伟, 冯旸赫, 程光权 ,等. 基于深度强化学习的多机协同空战方法研究[J]. 自动化学报, 2021,47(7): 1610-1623. |
SHI W , FENG Y H , CHENG G Q ,et al. Research on multi-aircraft cooperative air combat method based on deep reinforcement learning[J]. Acta Automatica Sinica, 2021,47(7): 1610-1623. | |
[22] | 徐佳乐, 张海东, 赵东海 ,等. 基于卷积神经网络的陆战兵棋战术机动策略学习[J]. 系统仿真学报, 2021:已录用. |
XU J L , ZHANG H D , ZHAO D H ,et al. Tactical maneuver strategy learning of wargame based on convolutional neural network[J]. Journal of System Simulation, 2021:acceped. | |
[23] | WANG H N , LIU N , ZHANG Y Y ,et al. Deep reinforcement learning:a survey[J]. Frontiers of Information Technology & Electronic Engineering, 2020,21(12): 1726-1744. |
[24] | MNIH V , KAVUKCUOGLU K , SILVER D ,et al. Human-level control through deep reinforcement learning[J]. Nature, 2015,518(7540): 529-533. |
[25] | SILVER D , HUANG A , MADDISON C J ,et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016,529(7587): 484-489. |
[26] | SILVER D , SCHRITTWIESER J , SIMONYAN K ,et al. Mastering the game of Go without human knowledge[J]. Nature, 2017,550(7676): 354-359. |
[27] | VINYALS O , BABUSCHKIN I , CZARNECKI W M ,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning[J]. Nature, 2019,575(7782): 350-354. |
[28] | BERNER C , BROCKMAN G , CHAN B ,et al. Dota 2 with large scale deep reinforcement learning[J]. arXiv preprint,2019,arXiv:1912.06680. |
[29] | BROWN N , SANDHOLM T . Superhuman AI for multiplayer poker[J]. Science, 2019,365(6456): 885-890. |
[30] | SCHRITTWIESER J , ANTONOGLOU I , HUBERT T ,et al. Mastering Atari,Go,chess and shogi by planning with a learned model[J]. Nature, 2020,588(7839): 604-609. |
[31] | PRICE M . What impact do VR controllers have on the traditional strategy game genre[D]. Huddersfield:University of Huddersfield, 2019. |
[32] | DAVID A S , JOHNSON M . Reinforcing deterrence on NATO’s eastern flank:wargaming the defense of the baltics[R]. 2016. |
[33] | CANNON C T , GOERICKE S . Using convolution neural networks to develop robust combat behaviors through reinforcement learning[D]. CA:Naval Postgraduate School, 2021. |
[34] | 缐珊珊 . 美俄人工智能军事应用发展分析[J]. 大数据, 2020,6(4): 125-132. |
XIAN S S . An analysis of the military application and development path of artificial intelligence in the United States and Russia[J]. Big Data Research, 2020,6(4): 125-132. | |
[35] | TARRAF D C , GILMORE J M , BOSTON S . An experiment in tactical wargaming with platforms enabled by artificial intelligence[R]. 2020. |
[36] | YE D H , LIU Z , SUN M F ,et al. Mastering complex control in MOBA games with deep reinforcement learning[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020,34(4): 6672-6679. |
[37] | BROCKMAN G , CHEUNG V , PETTERSSON L ,et al. OpenAI gym[J]. arXiv preprint,2016,arXiv:1606.01540. |
[38] | ARULKUMARAN K , CULLY A , TOGELIUS J . Alphastar:an evolutionary computation perspective[C]// Proceedings of the Genetic and Evolutionary Computation Conference Companion.[S.l.:s.n.], 2019: 314-315. |
[39] | YE D H , CHEN G B , ZHANG W ,et al. Towards playing full MOBA games with deep reinforcement learning[J]. arXiv preprint,2020,arXiv:2011.12692. |
[40] | MNIH V , KAVUKCUOGLU K , SILVER D ,et al. Playing atari with deep reinforcement learning[J]. arXiv preprint,2013,arXiv:1312.5602. |
[41] | 张凯峰, 俞扬 . 基于逆强化学习的示教学习方法综述[J]. 计算机研究与发展, 2019,56(2): 254-261. |
ZHANG K F , YU Y . Methodologies for imitation learning via inverse reinforcement learning:a review[J]. Journal of Computer Research and Development, 2019,56(2): 254-261. | |
[42] | 曹雷 . 基于深度强化学习的智能博弈对抗关键技术[J]. 指挥信息系统与技术, 2019,10(5): 1-7. |
CAO L . Key technologies of intelligent game confrontation based on deep reinforcement learning[J]. Command Information System and Technology, 2019,10(5): 1-7. | |
[43] | RISI S , PREUSS M . Behind DeepMind’s AlphaStar AI that reached grandmaster level in StarCraft II[J]. KI-KünstlicheIntelligenz, 2020,34(1): 85-86. |
[44] | SILVER D , VENESS J . Monte-Carlo planning in large POMDPs[C]// Proceedings of the Advances in Neural Information Processing Systems 23.[S.l.:s.n.], 2010. |
[45] | GOODMAN J , LUCAS S . Does it matter how well I know what you’re thinking? Opponent modelling in an RTS game[C]// Proceedings of 2020 IEEE Congress on Evolutionary Computation. Piscataway:IEEE Press, 2020: 1-8. |
[46] | JOHANSON M . Measuring the size of large no-limit poker games[J]. arXiv preprint,2013,arXiv:1302.7008. |
[47] | DUGAS D , NIETO J , SIEGWART R ,et al. Navrep:unsupervised representations for reinforcement learning of robot navigation in dynamic human environments[C]// Proceedings of 2021 IEEE International Conference on Robotics and Automation. Piscataway:IEEE Press, 2021: 7829-7835. |
[48] | ONTANóN S , SYNNAEVE G , URIARTE A ,et al. A survey of real-time strategy game AI research and competition in StarCraft[J]. IEEE Transactions on Computational Intelligence and AI in games, 2013,5(4): 293-311. |
[49] | FENNER S A , ROGERS J . Combinatorial game complexity:an introduction with poset games[J]. arXiv preprint,2015,arXiv:1505.07416. |
[50] | SUTTON R S , BARTO A G . Reinforcement learning:an introduction[J]. IEEE Transactions on Neural Networks, 2005,16(1): 285-286. |
[51] | VAN HASSELT H , GUEZ A , SILVER D . Deep reinforcement learning with double q-learning[C]// Proceedings of the 30th AAAI Conference on Artificial Intelligence. Piscataway:IEEE Press, 2016. |
[52] | SCHAUL T , QUAN J , ANTONOGLOU I ,et al. Prioritized experience replay[J]. arXiv preprint,2015,arXiv:1511.05952. |
[53] | WANG Z Y , SCHAUL T , HESSEL M ,et al. Dueling network architectures for deep reinforcement learning[J]. arXiv preprint,2015,arXiv:1511.06581. |
[54] | MNIH V , BADIA A P , MIRZA M ,et al. Asynchronous methods for deep reinforcement learning[C]// Proceedings of the 33rd International Conference on Machine Learning.[S.l.:s.n.], 2016: 1928-1937. |
[55] | 刘朝阳, 穆朝絮, 孙长银 . 深度强化学习算法与应用研究现状综述[J]. 智能科学与技术学报, 2020,2(4): 314-326. |
LIU Z Y , MU C X , SUN C Y . An overview on algorithms and applications of deep reinforcement learning[J]. Chinese Journal of Intelligent Science and Technology, 2020,2(4): 314-326. | |
[56] | LILLICRAP T P , HUNT J J , PRITZEL A ,et al. Continuous control with deep reinforcement learning[J]. arXiv preprint,2015,arXiv:1509.02971. |
[57] | LOWE R , WU Y , TAMAR A ,et al. Multi-agent actor-critic for mixed cooperative-competitive environments[C]// Proceedings of the Advances in Neural Information Processing Systems 30.[S.l.:s.n.], 2018. |
[58] | SCHULMAN J , WOLSKI F , DHARIWAL P ,et al. Proximal policy optimization algorithms[J]. arXiv preprint,2017,arXiv:1707.06347. |
[59] | HAARNOJA T , ZHOU A , ABBEEL P ,et al. Soft actor-critic:off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]// Proceedings of the International Conference on Machine Learning.[S.l.:s.n.], 2018: 1861-1870. |
[60] | FUJIMOTO S , VAN HOOF H , MEGER D . Addressing function approximation error in actor-critic methods[C]// Proceedings of the International Conference on Machine Learning.[S.l.:s.n.], 2018: 1587-1596. |
[61] | FLORENSA C , DUAN Y , ABBEEL P . Stochastic neural networks for hierarchical reinforcement learning[J]. arXiv preprint,2017,arXiv:1704.03012. |
[62] | RAFATI J , NOELLE D C . Learning representations in model-free hierarchical reinforcement learning[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019,33: 10009-10010. |
[63] | PANG Z J , LIU R Z , MENG Z Y ,et al. On reinforcement learning for full-length game of StarCraft[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019,33: 4691-4698. |
[64] | LI S Y , WANG R , TANG M X ,et al. Hierarchical reinforcement learning with advantage-based auxiliary rewards[J]. arXiv preprint,2019,arXiv:1910.04450. |
[65] | HOCHREITER S , SCHMIDHUBER J . Long short-term memory[J]. Neural Computation, 1997,9(8): 1735-1780. |
[66] | YAO X . A review of evolutionary artificial neural networks[J]. International Journal of Intelligent Systems, 1993,8(4): 539-567. |
[67] | DING S F , LI H , SU C Y ,et al. Evolutionary artificial neural networks:a review[J]. Artificial Intelligence Review, 2013,39(3): 251-260. |
[68] | YAO X , LIU Y . A new evolutionary system for evolving artificial neural networks[J]. IEEE Transactions on Neural Networks, 1997,8(3): 694-713. |
[69] | SALIMANS T , HO J , CHEN X ,et al. Evolution strategies as a scalable alternative to reinforcement learning[J]. arXiv preprint,2017,arXiv:1703.03864. |
[70] | SUCH F P , MADHAVAN V , CONTI E ,et al. Deep neuroevolution:genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning[J]. arXiv preprint,2017,arXiv:1712.06567. |
[71] | 栾丽华, 吉根林 . 决策树分类技术研究[J]. 计算机工程, 2004,30(9): 94-96,105. |
LUAN L H , JI G L . The study on decision tree classification techniques[J]. Computer Engineering, 2004,30(9): 94-96,105. | |
[72] | 鲁大剑 . 面向作战推演的博弈与决策模型及应用研究[D]. 南京:南京理工大学, 2013. |
LU D J . Research on game and decision model for operational deduction and its application[D]. Nanjing:Nanjing University of technology, 2013. | |
[73] | 尹星, 孙鹏, 韩冰 . 基于决策树的作战实体行为规则建模[J]. 指挥控制与仿真, 2020,42(1): 15-19. |
YIN X , SUN P , HAN B . Modeling of behavior rules of combat entities based on decision tree[J]. Command Control & Simulation, 2020,42(1): 15-19. | |
[74] | ZHOU Z H , FENG J . Deep forest[J]. National Science Review, 2019,6(1): 74-86. |
[75] | 董浩洋, 张永亮, 齐宁 ,等. 基于综合势能的作战行动序列生成方法研究[J]. 军事运筹与系统工程, 2020,34(3): 11-18. |
DONG H Y , ZHANG Y L , QI N ,et al. Research on the method of generating operational sequence based on comprehensive potential energy[J]. Military Operations Research and Systems Engineering, 2020,34(3): 11-18. | |
[76] | BREIMAN L . Random forests[J]. Machine learning, 2001,45(1): 5-32. |
[77] | DE MESENTIER SILVA F , TOGELIUS J , LANTZ F ,et al. Generating novice heuristics for post-flop poker[C]// Proceedings of 2018 IEEE Conference on Computational Intelligence and Games. Piscataway:IEEE Press, 2018: 1-8. |
[78] | 周献中, 郭庆军, 鞠恒荣 . 基于人件服务的C4ISR服务视点扩展[J]. 指挥信息系统与技术, 2016,7(5): 1-9. |
ZHOU X Z , GUO Q J , JU H R . Extended C4ISR service viewpoint based on humanware service[J]. Command Information System and Technology, 2016,7(5): 1-9. | |
[79] | 朱咸军, 周献中, 王友发 ,等. 面向新型决策系统的人件模型研究[J]. 中国科技论坛, 2016(6): 121-127. |
ZHU X J , ZHOU X Z , WANG Y F ,et al. Research on humanware model of novel decision system-oriented[J]. Forum on Science and Technology in China, 2016(6): 121-127. | |
[80] | LUCAS Simon, 沈甜雨, 王晓, ,等. 基于统计前向规划算法的游戏通用人工智能[J]. 智能科学与技术学报, 2019,1(3): 219-227. |
SIMON L , SHEN T Y , WANG X ,et al. General game AI with statistical forward planning algorithms[J]. Chinese Journal of Intelligent Science and Technology, 2019,1(3): 219-227. | |
[81] | SHAO K , ZHU Y H , ZHAO D B . StarCraft micromanagement with reinforcement learning and curriculum transfer learning[J]. IEEE Transactions on Emerging Topics in Computational Intelligence, 2019,3(1): 73-84. |
[82] | SILVER D , HUBERT T , SCHRITTWIESER J ,et al. A general reinforcement learning algorithm that masters chess,shogi,and Go through self-play[J]. Science, 2018,362(6419): 1140-1144. |
[83] | TANG Z T , ZHU Y H , ZHAO D B ,et al. Enhanced rolling horizon evolution algorithm with opponent model learning[J]. IEEE Transactions on Games, 2020:1. |
[84] | 杨旭, 王锐, 张涛 . 面向无人机集群路径规划的智能优化算法综述[J]. 控制理论与应用, 2020,37(11): 2291-2302. |
YANG X , WANG R , ZHANG T . Review of unmanned aerial vehicle swarm path planning based on intelligent optimization[J]. Control Theory & Applications, 2020,37(11): 2291-2302. | |
[85] | 张菁, 何友, 彭应宁 ,等. 基于神经网络和人工势场的协同博弈路径规划[J]. 航空学报, 2019,40(3): 322493. |
ZHANG J , HE Y , PENG Y N ,et al. Neural network and artificial potential field based cooperative and adversarial path planning[J]. Acta Aeronautica et Astronautica Sinica, 2019,40(3): 322493. | |
[86] | LEE D , TANG H R , ZHANG J O ,et al. Modular architecture for StarCraft II with deep reinforcement learning[C]// Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment.[S.l.:s.n.], 2018. |
[87] | MEENAKSHI N . An efficient agent created in StarcCraft 2 using pysc2[J]. Turkish Journal of Computer and Mathematics Education (TURCOMAT), 2021,12(10): 336-342. |
[1] | 马帅, 傅启明, 陈建平, 冯帆, 陆悠, 李铮伟, 裘舒年. 基于双池DQN的HVAC无模型优化控制方法[J]. 智能科学与技术学报, 2022, 4(3): 426-444. |
[2] | 刘家成, 张向文. 基于TD3的电动汽车复合电源能量管理策略研究[J]. 智能科学与技术学报, 2022, 4(2): 277-287. |
[3] | 冯埔, 吴文峻, 罗杰, 于鑫, 田雍恺. 基于群体熵的机器人群体智能汇聚度量[J]. 智能科学与技术学报, 2022, 4(1): 65-74. |
[4] | 胡志强. 大数据智能指挥控制内在机理框架模型研究[J]. 智能科学与技术学报, 2021, 3(1): 101-109. |
[5] | 刘朝阳, 穆朝絮, 孙长银. 深度强化学习算法与应用研究现状综述[J]. 智能科学与技术学报, 2020, 2(4): 314-326. |
[6] | 李涛, 魏庆来. 基于深度强化学习的智能暖气温度控制系统[J]. 智能科学与技术学报, 2020, 2(4): 348-353. |
[7] | 王日中, 李慧平, 崔迪, 徐德民. 基于深度强化学习算法的自主式水下航行器深度控制[J]. 智能科学与技术学报, 2020, 2(4): 354-360. |
[8] | 傅汇乔, 唐开强, 邓归洲, 王鑫鹏, 陈春林. 基于深度强化学习的六足机器人运动规划[J]. 智能科学与技术学报, 2020, 2(4): 361-371. |
[9] | 沈宇,韩金朋,李灵犀,王飞跃. 游戏智能中的AI——从多角色博弈到平行博弈[J]. 智能科学与技术学报, 2020, 2(3): 205-213. |
阅读次数 | ||||||
全文 |
摘要 |