Journal on Communications ›› 2018, Vol. 39 ›› Issue (11): 106-115.doi: 10.11959/j.issn.1000-436x.2018238

• Papers • Previous Articles     Next Articles

Enhanced deep deterministic policy gradient algorithm

Jianping CHEN1,2,3,4,Chao HE1,2,3,Quan LIU5,Hongjie WU1,2,3,4,Fuyuan HU1,2,3,4,Qiming FU1,2,3,4()   

  1. 1 Institute of Electronics and Information Engineering,Suzhou University of Science and Technology,Suzhou 215009,China
    2 Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency,Suzhou University of Science and Technology,Suzhou 215009,China
    3 Suzhou Key Laboratory of Mobile Networking and Applied Technologies,Suzhou University of Science and Technology,Suzhou 215009,China
    4 Virtual Reality Key Laboratory of Intelligent Interaction and Application Technology of Suzhou,Suzhou University of Science and Technology,Suzhou 215009,China
    5 School of Computer Science and Technology,Soochow University,Suzhou 215006,China
  • Revised:2018-08-01 Online:2018-11-01 Published:2018-12-10
  • Supported by:
    The National Natural Science Foundation of China(61502329);The National Natural Science Foundation of China(61772357);The National Natural Science Foundation of China(61750110519);The National Natural Science Foundation of China(61772355);The National Natural Science Foundation of China(61702055);The National Natural Science Foundation of China(61672371);The National Natural Science Foundation of China(61602334);The National Natural Science Foundation of China(61502323);The Natural Science Foundation of Jiangsu Province(BK20140283);The Key Research and Development Program of Jiangsu Province(BE2017663);High School Natural Foundation of Jiangsu Province(13KJB520020);Suzhou Industrial Application of Basic Research Program Part(SYG201422)

Abstract:

With the problem of slow convergence for deep deterministic policy gradient algorithm,an enhanced deep deterministic policy gradient algorithm was proposed.Based on the deep deterministic policy gradient algorithm,two sample pools were constructed,and the time difference error was introduced.The priority samples were added when the experience was played back.When the samples were trained,the samples were selected from two sample pools respectively.At the same time,the bisimulation metric was introduced to ensure the diversity of the selected samples and improve the convergence rate of the algorithm.The E-DDPG algorithm was used to pendulum problem.The experimental results show that the E-DDPG algorithm can effectively improve the convergence performance of the continuous action space problems and have better stability.

Key words: deep reinforcement learning, sample ranking, bisimulation metric, temporal difference error

CLC Number: 

No Suggested Reading articles found!