智能科学与技术学报 ›› 2020, Vol. 2 ›› Issue (4): 354-360.doi: 10.11959/j.issn.2096-6652.202038

• 专刊:深度强化学习 • 上一篇    下一篇

基于深度强化学习算法的自主式水下航行器深度控制

王日中, 李慧平, 崔迪, 徐德民   

  1. 西北工业大学航海学院,陕西 西安 710072
  • 修回日期:2020-11-30 出版日期:2020-12-15 发布日期:2020-12-01
  • 作者简介:王日中(1995- ),男,西北工业大学航海学院博士生,主要研究方向为多机器人协同、强化学习。
    李慧平(1983- ),男,博士,西北工业大学航海学院教授,博士生导师,主要研究方向为水下机器人导航定位、智能决策与优化控制、模型预测控制、多机器人协同。
    崔迪(1995- ),女,西北工业大学航海学院博士生,主要研究方向为事件触发控制、模型预测控制。
    徐德民(1937- ),男,中国工程院院士,西北工业大学航海学院教授,博士生导师,主要研究方向为水下航行器总体设计、导航制导与控制。
  • 基金资助:
    国家自然科学基金资助项目(61922068);国家自然科学基金资助项目(61733014);陕西省杰出青年科学基金资助项目(2019JC-14);西北工业大学翱翔青年学者项目(20GH0201111)

Depth control of autonomous underwater vehicle using deep reinforcement learning

Rizhong WANG, Huiping LI, Di CUI, Demin XU   

  1. School of Marine Science and Technology, Northwestern Polytechnical University, Xi’an 710072, China
  • Revised:2020-11-30 Online:2020-12-15 Published:2020-12-01
  • Supported by:
    The National Natural Science Foundation of China(61922068);The National Natural Science Foundation of China(61733014);The Science Foundation for Distinguished Young Scholars of Shaanxi Province(2019JC-14);Northwestern Polytechnical University Aoxiang Youth Scholar Program(20GH0201111)

摘要:

研究了基于深度强化学习算法的自主式水下航行器(AUV)深度控制问题。区别于传统的控制算法,深度强化学习方法让航行器自主学习控制律,避免人工建立精确模型和设计控制律。采用深度确定性策略梯度方法设计了actor与critic两种神经网络。actor神经网络给出控制策略,critic神经网络用于评估该策略,AUV的深度控制可以通过训练这两个神经网络实现。在OpenAI Gym平台上仿真验证了算法的有效性。

关键词: 自主式水下航行器, 深度控制, 深度强化学习

Abstract:

The depth control problem of autonomous underwater vehicle (AUV) by using deep reinforcement learning method was mainly studied.Different from the traditional control algorithm, the deep reinforcement learning method allows the AUV to learn the control law independently, avoiding the artificial establishment of accurate model and design control law.The deep deterministic policy gradient method was used to design two neural networks: actor and critic.Actor neural network enabled agents to make corresponding control actions.Critic neural network was used to estimate the action-value function in reinforcement learning.The AUV depth control was conducted by training of actor and critic neural networks.The effectiveness of the algorithm was proved by simulation on OpenAI Gym.

Key words: autonomous underwater vehicle, depth control, deep reinforcement learning

中图分类号: 

No Suggested Reading articles found!