Journal on Communications ›› 2018, Vol. 39 ›› Issue (8): 37-47.doi: 10.11959/j.issn.1000-436x.2018133

• Artificial Intelligence and Network Security • Previous Articles     Next Articles

Heuristic Sarsa algorithm based on value function transfer

Jianping CHEN1,2,3,Zhengxia YANG1,2,3,Quan LIU4,Hongjie WU1,2,3,Yang XU5,Qiming FU1,2,3()   

  1. 1 Institute of Electronics and Information Engineering,Suzhou University of Science and Technology,Suzhou 215009,China
    2 Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency,Suzhou University of Science and Technology,Suzhou 215009,China
    3 Suzhou Key Laboratory of Mobile Networking and Applied Technologies,Suzhou University of Science and Technology,Suzhou 215009,China
    4 School of Computer Science and Technology,Soochow University,Suzhou 215000,China
    5 Institute of Information Engineering,Zhejiang Fashion Institute of Technology College,Ningbo 315000,China
  • Revised:2018-07-13 Online:2018-08-01 Published:2018-09-13
  • Supported by:
    The National Natural Science Foundation of China(61502329);The National Natural Science Foundation of China(61772357);The National Natural Science Foundation of China(61750110519);The National Natural Science Foundation of China(61772355);The National Natural Science Foundation of China(61702055);The National Natural Science Foundation of China(61672371);The National Natural Science Foundation of China(61602334);The Natural Science Foundation of Jiangsu Province(BK20140283);The Key Research and Development Program of Jiangsu Province(BE2017663);High School Natural Science Foundation of Jiangsu Province(13KJB520020);Suzhou Industrial Application of Basic Research Program Part(SYG201422)

Abstract:

With the problem of slow convergence for traditional Sarsa algorithm,an improved heuristic Sarsa algorithm based on value function transfer was proposed.The algorithm combined traditional Sarsa algorithm and value function transfer method,and the algorithm introduced bisimulation metric and used it to measure the similarity between new tasks and historical tasks in which those two tasks had the same state space and action space and speed up the algorithm convergence.In addition,combined with heuristic exploration method,the algorithm introduced Bayesian inference and used variational inference to measure information gain.Finally,using the obtained information gain to build intrinsic reward function model as exploring factors,to speed up the convergence of the algorithm.Applying the proposed algorithm to the traditional Grid World problem,and compared with the traditional Sarsa algorithm,the Q-Learning algorithm,and the VFT-Sarsa algorithm,the IGP-Sarsa algorithm with better convergence performance,the experiment results show that the proposed algorithm has faster convergence speed and better convergence stability.

Key words: reinforcement learning, value function transfer, bisimulation metric, variational Bayes

CLC Number: 

No Suggested Reading articles found!