智能科学与技术学报 ›› 2022, Vol. 4 ›› Issue (3): 418-425.doi: 10.11959/j.issn.2096-6652.202244

• 学术论文 • 上一篇    下一篇

基于改进DQN算法的复杂海战场路径规划方法

郁洲1, 毕敬1, 苑海涛2   

  1. 1 北京工业大学信息学部,北京 100124
    2 北京航空航天大学自动化科学与电气工程学院,北京 100191
  • 修回日期:2022-08-15 出版日期:2022-09-15 发布日期:2022-09-01
  • 作者简介:郁洲(1998- ),男,北京工业大学信息学部硕士生,主要研究方向为深度强化学习、云数据中心任务调度和资源分配等
    毕敬(1979- ),女,博士,北京工业大学信息学部副教授、博士生导师,主要研究方向为大数据分析、计算智能与调控优化、深度学习、数据驱动建模、云边协同与节能等
    苑海涛(1986- ),男,博士,北京航空航天大学自动化科学与电气工程学院副教授,主要研究方向为云计算、边缘计算、数据中心、深度学习、智能优化、时间序列特征建模与预测等
  • 基金资助:
    国家自然科学基金资助项目(62073005);国家自然科学基金资助项目(62173013)

A path planning method for complex naval battle field based on an improved DQN algorithm

Zhou YU1, Jing BI1, Haitao YUAN2   

  1. 1 Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
    2 School of Automation Science and Electrical Engineering, Beihang University, Beijing 100191, China
  • Revised:2022-08-15 Online:2022-09-15 Published:2022-09-01
  • Supported by:
    The National Natural Science Foundation of China(62073005);The National Natural Science Foundation of China(62173013)

摘要:

为了有效地解决海战场环境下多舰艇的追踪目标问题,以多智能体(舰艇)为研究对象,提出一种基于改进的Deep Q-Network(DQN)算法的路径规划方法。DQN算法结合多智能体的强化学习环境特性,在传统DQN算法的基础上,添加一个结构相同、参数不同的网络,分别对Q实际值和估计值进行更新来实现价值函数的收敛。此外,该方法使用经验回放和目标网络双参数更新机制,有效地解决了神经网络训练误差大、泛化能力差和训练不稳定等问题。实验结果表明,与传统的算法相比,提出的方法能够更快地适应复杂多变的多类型海战场环境,避障能力提高两倍多,并且在环境中能够获得更高的训练奖励。

关键词: 深度Q网络, 强化学习, 多智能体, 路径规划, 目标追踪

Abstract:

To solve a target tracking problem of multiple warships in a sea battlefield environment, the multiple agents (warships) were focused on, and an improved deep Q-network (DQN) algorithm was proposed.It considers the characteristics of a multi-agent reinforcement learning environment based on a traditional DQN algorithm.It adds a network with the same structure and different parameters and updates a Q actual value and a Q estimated one, respectively to realize convergence of a value function.Besides, it adopts a mechanism of experience playback and an update of one of two parameters for a target network to effectively solve problems of high training errors of neural networks, poor generalization ability, and unstable training.Experimental results demonstrate that compared with the traditional DQN algorithm, the improved DQN one adapts to different complex and dynamical sea battlefield environments in a faster manner, and the ability to avoid obstacles has more than doubled and larger training reward.

Key words: deep Q network, reinforcement learning, multi-agent, path planning, target tracking

中图分类号: 

No Suggested Reading articles found!