基于改进DQN算法的复杂海战场路径规划方法

doi:10.11959/j.issn.2096-6652.202244

智能科学与技术学报 ›› 2022, Vol. 4 ›› Issue (3): 418-425.doi: 10.11959/j.issn.2096-6652.202244

基于改进DQN算法的复杂海战场路径规划方法

郁洲¹, 毕敬¹, 苑海涛²

¹ 北京工业大学信息学部，北京 100124
² 北京航空航天大学自动化科学与电气工程学院，北京 100191

修回日期:2022-08-15 出版日期:2022-09-15 发布日期:2022-09-01
作者简介:郁洲（1998- ），男，北京工业大学信息学部硕士生，主要研究方向为深度强化学习、云数据中心任务调度和资源分配等
毕敬（1979- ），女，博士，北京工业大学信息学部副教授、博士生导师，主要研究方向为大数据分析、计算智能与调控优化、深度学习、数据驱动建模、云边协同与节能等
苑海涛（1986- ），男，博士，北京航空航天大学自动化科学与电气工程学院副教授，主要研究方向为云计算、边缘计算、数据中心、深度学习、智能优化、时间序列特征建模与预测等
基金资助:
国家自然科学基金资助项目(62073005);国家自然科学基金资助项目(62173013)

A path planning method for complex naval battle field based on an improved DQN algorithm

Zhou YU¹, Jing BI¹, Haitao YUAN²

¹ Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
² School of Automation Science and Electrical Engineering, Beihang University, Beijing 100191, China

Revised:2022-08-15 Online:2022-09-15 Published:2022-09-01
Supported by:
The National Natural Science Foundation of China(62073005);The National Natural Science Foundation of China(62173013)

摘要/Abstract

摘要：

为了有效地解决海战场环境下多舰艇的追踪目标问题，以多智能体（舰艇）为研究对象，提出一种基于改进的Deep Q-Network（DQN）算法的路径规划方法。DQN算法结合多智能体的强化学习环境特性，在传统DQN算法的基础上，添加一个结构相同、参数不同的网络，分别对Q实际值和估计值进行更新来实现价值函数的收敛。此外，该方法使用经验回放和目标网络双参数更新机制，有效地解决了神经网络训练误差大、泛化能力差和训练不稳定等问题。实验结果表明，与传统的算法相比，提出的方法能够更快地适应复杂多变的多类型海战场环境，避障能力提高两倍多，并且在环境中能够获得更高的训练奖励。

关键词: 深度Q网络, 强化学习, 多智能体, 路径规划, 目标追踪

Abstract:

To solve a target tracking problem of multiple warships in a sea battlefield environment, the multiple agents (warships) were focused on, and an improved deep Q-network (DQN) algorithm was proposed.It considers the characteristics of a multi-agent reinforcement learning environment based on a traditional DQN algorithm.It adds a network with the same structure and different parameters and updates a Q actual value and a Q estimated one, respectively to realize convergence of a value function.Besides, it adopts a mechanism of experience playback and an update of one of two parameters for a target network to effectively solve problems of high training errors of neural networks, poor generalization ability, and unstable training.Experimental results demonstrate that compared with the traditional DQN algorithm, the improved DQN one adapts to different complex and dynamical sea battlefield environments in a faster manner, and the ability to avoid obstacles has more than doubled and larger training reward.

Key words: deep Q network, reinforcement learning, multi-agent, path planning, target tracking

中图分类号:

TP399，TP391

郁洲,毕敬,苑海涛. 基于改进DQN算法的复杂海战场路径规划方法[J]. 智能科学与技术学报, 2022, 4(3): 418-425.

Zhou YU,Jing BI,Haitao YUAN. A path planning method for complex naval battle field based on an improved DQN algorithm[J]. Chinese Journal of Intelligent Science and Technology, 2022, 4(3): 418-425.

图/表 15

图1

表1

图2

图3

图4

表2

图5

图6

图7

图8

表3

图9

图10

图11

图12

参考文献 20

[1]	史进, 董瑶, 白振东 ,等. 移动机器人动态路径规划方法的研究与实现[J]. 计算机应用, 2017,37(11): 3119-3123.
	SHI J , DONG Y , BAI Z D ,et al. Research and implementation of mobile robot path planning method[J]. Journal of Computer Applications, 2017,37(11): 3119-3123.
[2]	魏彤, 龙琛 . 基于改进遗传算法的移动机器人路径规划[J]. 北京航空航天大学学报, 2020,46(4): 703-711.
	WEI T , LONG C . Path planning for mobile robot based on improved genetic algorithm[J]. Journal of Beijing University of Aeronautics and Astronautics, 2020,46(4): 703-711.
[3]	YOU X M , LIU S , LYU J Q . Ant colony algorithm based on dynamic search strategy and its application on path planning of robot[J]. Control and Decision, 2017,32(3): 552-556.
[4]	默凡凡 . 基于 Q 学习算法的移动机器人路径规划方法研究[D]. 北京:北京工业大学, 2016.
	MO F F . Research on path planning of mobile robot based on Q learning algorithm[D]. Beijing:Beijing University of Technology, 2016.
[5]	姜涛, 王建中, 施家栋 . 小型移动机器人自主返航路径规划方法[J]. 计算机工程, 2015,41(1): 164-168.
	JIANG T , WANG J Z , SHI J D . Path planning method in autonomous returning for mini-mobile robot[J]. Computer Engineering, 2015,41(1): 164-168.
[6]	刘洁, 赵海芳, 周德廉 . 一种改进量子行为粒子群优化算法的移动机器人路径规划[J]. 计算机科学, 2017,44(S2): 123-128.
	LIU J , ZHAO H F , ZHOU D L . Improved quantum behaved particle swarm optimization algorithm for mobile robot path planning[J]. Computer Science, 2017,44(S2): 123-128.
[7]	孙炜, 吕云峰, 唐宏伟 ,等. 基于一种改进A*算法的移动机器人路径规划[J]. 湖南大学学报(自然科学版), 2017,44(4): 94-101.
	SUN W , LYU Y F , TANG H W ,et al. Mobile robot path planning based on an improved A* algorithm[J]. Journal of Hunan University (Natural Sciences), 2017,44(4): 94-101.
[8]	胡俊, 朱庆保 . 未知环境下基于有先验知识的滚动Q学习机器人路径规划[J]. 控制与决策, 2010,25(9): 1364-1368.
	HU J , ZHU Q B . Path planning of robot for unknown environment based on prior knowledge rolling Q-learning[J]. Control and Decision, 2010,25(9): 1364-1368.
[9]	XIN J , ZHAO H , LIU D ,et al. Application of deep reinforcement learning in mobile robot path planning[C]// Proceedings of 2017 Chinese Automation Congress. Piscataway:IEEE Press, 2017: 7112-7116.
[10]	TAI L , LIU M . Towards cognitive exploration through deep reinforcement learning for mobile robots[J]. arXiv preprint,2016,arXiv:1610.01733.
[11]	江其洲, 曾碧 . 基于深度强化学习的移动机器人导航策略研究[J]. 计算机测量与控制, 2019,27(8): 217-221,226.
	JIANG Q Z , ZENG B . Research on navigation strategy of mobile robot based on deep reinforcement learning[J]. Computer Measurement＆ Control, 2019,27(8): 217-221,226.
[12]	张荣霞, 武长旭, 孙同超 ,等. 深度强化学习及在路径规划中的研究进展[J]. 计算机工程与应用, 2021,57(19): 44-56.
	ZHANG R X , WU C X , SUN T C ,et al. Progress on deep reinforcement learning in path planning[J]. Computer Engineering and Applications, 2021,57(19): 44-56.
[13]	刘朝阳, 穆朝絮, 孙长银 . 深度强化学习算法与应用研究现状综述[J]. 智能科学与技术学报, 2020,2(4): 314-326.
	LIU Z Y , MU C X , SUN C Y . An overview on algorithms and applications of deep reinforcement learning[J]. Chinese Journal of Intelligent Science and Technology, 2020,2(4): 314-326.
[14]	韩向敏, 鲍泓, 梁军 ,等. 一种基于深度强化学习的自适应巡航控制算法[J]. 计算机工程, 2018,44(7): 32-35,41.
	HAN X M , BAO H , LIANG J ,et al. An adaptive cruise control algorithm based on deep reinforcement learning[J]. Computer Engineering, 2018,44(7): 32-35,41.
[15]	刘婵, 朱永川, 白园 ,等. 基于 flocking 的多智能体群集与避障算法研究与仿真[J]. 通信技术, 2019,52(7): 1632-1638.
	LIU C , ZHU Y C , BAI Y ,et al. Flocking for multi-agent clustering and obstacle avoidance:algorithm and simulation[J]. Communications Technology, 2019,52(7): 1632-1638.
[16]	刘辉, 肖克, 王京擘 . 基于多智能体强化学习的多 AGV 路径规划方法[J]. 自动化与仪表, 2020,35(2): 84-89.
	LIU H , XIAO K , WANG J B . Multi-AGV path planning method based on multi-agent reinforcement learning[J]. Automation ＆ Instrumentation, 2020,35(2): 84-89.
[17]	黄子蓉, 甯彦淞, 王莉 . 基于优先经验回放的多智能体协同算法[J]. 太原理工大学学报, 2021,52(5): 747-753.
	HUANG Z R , NING Y S , WANG L . Prioritized experience replay for multi-agent cooperation[J]. Journal of Taiyuan University of Technology, 2021,52(5): 747-753.
[18]	高振洋, 秦斌 . 深度强化学习研究进展[J]. 电脑知识与技术, 2019,15(4): 157-159,173.
	GAO Z Y , QIN B . Progress in deep reinforcement learning research[J]. Computer Knowledge and Technology, 2019,15(4): 157-159,173.
[19]	杨思明, 单征, 丁煜 ,等. 深度强化学习研究综述[J]. 计算机工程, 2021,47(12): 19-29.
	YANG S M , SHAN Z , DING Y ,et al. Survey of research on deep reinforcement learning[J]. Computer Engineering, 2021,47(12): 19-29.
[20]	刘全, 翟建伟, 章宗长 ,等. 深度强化学习综述[J]. 计算机学报, 2018,41(1): 1-27.
	LIU Q , ZHAI J W , ZHANG Z C ,et al. A survey on deep reinforcement learning[J]. Chinese Journal of Computers, 2018,41(1): 1-27.

奖励值	情况
+1	到达目标点
-1	碰到障碍物
0	其他动作

名称	版本型号
CPU	CORE i5 7th Gen
内存	8 GB
GPU	NVIDIA GETFORCE 930MX
系统	Windows 10
Tensorflow-GPU	1.4

算法	环境1	环境2
DQN	101 170	151 150
DDQN	13 135	74 539
IDQN	12 335	62 709

基于改进DQN算法的复杂海战场路径规划方法

A path planning method for complex naval battle field based on an improved DQN algorithm

在线阅读

PDF下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 15

参考文献 20

相关文章 15

Metrics

推荐阅读 0

[1]	张佳欣, 张森林, 刘妹琴, 董山玲, 郑荣濠. 面向海洋环境自适应采样的多AUV协同定位[J]. 智能科学与技术学报, 2022, 4(4): 503-512.
[2]	马帅, 傅启明, 陈建平, 冯帆, 陆悠, 李铮伟, 裘舒年. 基于双池DQN的HVAC无模型优化控制方法[J]. 智能科学与技术学报, 2022, 4(3): 426-444.
[3]	徐德, 秦方博. 机器人自动轴孔装配研究进展[J]. 智能科学与技术学报, 2022, 4(2): 200-211.
[4]	王雨倩, 丁嵘. 基于群体智能成果的路径规划程序自动生成系统[J]. 智能科学与技术学报, 2022, 4(2): 255-263.
[5]	董璐, 熊爱玲. 基于改进RRT*-Smart的复杂动态环境下的无人艇路径规划[J]. 智能科学与技术学报, 2022, 4(2): 264-276.
[6]	刘家成, 张向文. 基于TD3的电动汽车复合电源能量管理策略研究[J]. 智能科学与技术学报, 2022, 4(2): 277-287.
[7]	孙宇祥, 彭益辉, 李斌, 周佳炜, 张鑫磊, 周献中. 智能博弈综述：游戏AI对作战推演的启示[J]. 智能科学与技术学报, 2022, 4(2): 157-173.
[8]	冯埔, 吴文峻, 罗杰, 于鑫, 田雍恺. 基于群体熵的机器人群体智能汇聚度量[J]. 智能科学与技术学报, 2022, 4(1): 65-74.
[9]	王涵, 俞扬, 姜远. 基于动态自选择参数共享的合作多智能体强化学习算法[J]. 智能科学与技术学报, 2022, 4(1): 75-83.
[10]	项羽铭, 陈焜, 赵志峰, 李荣鹏, 张宏纲. 脑注意力机制启发的群体智能协同避障方法[J]. 智能科学与技术学报, 2022, 4(1): 84-96.
[11]	夏丽娜, 李擎, 宋睿卓, 王子涵, 许镇. 未知异构多智能体系统无模型自适应动态规划同步控制[J]. 智能科学与技术学报, 2021, 3(4): 444-448.
[12]	王腾, 潘晶, 董璐, 孙长银. 面向防疫的智能导诊机器人关键技术及应用[J]. 智能科学与技术学报, 2021, 3(2): 187-194.
[13]	胡志强. 大数据智能指挥控制内在机理框架模型研究[J]. 智能科学与技术学报, 2021, 3(1): 101-109.
[14]	刘朝阳, 穆朝絮, 孙长银. 深度强化学习算法与应用研究现状综述[J]. 智能科学与技术学报, 2020, 2(4): 314-326.
[15]	李金娜, 程薇燃. 基于强化学习的数据驱动多智能体系统最优一致性综述[J]. 智能科学与技术学报, 2020, 2(4): 327-340.