Journal on Communications ›› 2017, Vol. 38 ›› Issue (4): 166-177.doi: 10.11959/j.issn.1000-436x.2017089

• Correspondences • Previous Articles     Next Articles

Actor-critic algorithm with incremental dual natural policy gradient

Peng ZHANG1,Quan LIU1,2,3,Shan ZHONG1,Jian-wei ZHAI1,Wei-sheng QIAN1   

  1. 1 School of Computer Science and Technology,Soochow University,Suzhou 215006,China
    2 Collaborative Innovation Center of Novel Software Technology and Industrialization,Nanjing 210000,China
    3 Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education,Jilin University,Changchun 130012,China
  • Revised:2017-03-03 Online:2017-04-01 Published:2017-07-20
  • Supported by:
    The National Natural Science Foundation of China(61272005);The National Natural Science Foundation of China(61303108);The National Natural Science Foundation of China(61373094);The National Natural Science Foundation of China(61472262);The National Natural Science Foundation of China(61502323);The National Natural Science Foundation of China(61502329);The Natural Science Foundation of Jiangsu Province(BK2012616);The High School Natural Foundation of Jiangsu Province(13KJB520020);The Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education,Jilin University(93K172014K04);Suzhou Industrial Application of Basic Research Program(SYG201422);Suzhou Industrial Application of Basic Research Program(SYG201308)

Abstract:

The existed algorithms for continuous action space failed to consider the way of selecting optimal action and utilizing the knowledge of the action space,so an efficient actor-critic algorithm was proposed by improving the natural gradient.The objective of the proposed algorithm was to maximize the expected return.Upper and the lower bounds of the action range were weighted to obtain the optimal action.The two bounds were approximated by linear function.Afterward,the problem of obtaining the optimal action was transferred to the learning of double policy parameter vectors.To speed the learning,the incremental Fisher information matrix and the eligibilities of both bounds were designed.At three reinforcement learning problems,compared with other representative methods with continuous action space,the simulation results show that the proposed algorithm has the advantages of rapid convergence rate and high convergence stability.

Key words: reinforcement learning, natural gradient, actor-critic, continuous space

CLC Number: 

No Suggested Reading articles found!