Chinese Journal of Network and Information Security ›› 2023, Vol. 9 ›› Issue (4): 16-28.doi: 10.11959/j.issn.2096-109x.2023050

• Papers • Previous Articles    

Function approximation method based on weights gradient descent in reinforcement learning

Xiaoyan QIN1, Yuhan LIU2, Yunlong XU3, Bin LI4   

  1. 1 School of Information and Software, Global Institute of Software Technology, Suzhou 215163, China
    2 University of Waterloo, Waterloo, N2L3G4, Canada
    3 Applied Technology College, Soochow University, Suzhou 215325, China
    4 School of Computer Science and Technology, Soochow University, Suzhou 215325, China
  • Revised:2023-05-30 Online:2023-08-01 Published:2023-08-01
  • Supported by:
    The National Natural Science Foundation of China(61772355);The National Natural Science Foundation of China(61702055);The National Natural Science Foundation of China(61876217);The National Natural Science Foundation of China(62176175);Jiangsu Province Natural Science Research University Major Projects(18KJA520011);Jiangsu Province Natural Science Research University Major Projects(17KJA520004);Suzhou Industrial Application of Basic Research Program Part(SYG201422);Jiangsu Province High End Research and Training Project for Professional Leaders of Teachers in Vocational Colleges(2021GRFX052);Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions;Project Funded byJiangsu Province Vocational Education the Double-Qualified of Teaching Stu-dio in Software Technology

Abstract:

Function approximation has gained significant attention in reinforcement learning research as it effectively addresses problems with large-scale, continuous state, and action space.Although the function approximation algorithm based on gradient descent method is one of the most widely used methods in reinforcement learning, it requires careful tuning of the step size parameter as an inappropriate value can lead to slow convergence, unstable convergence, or even divergence.To address these issues, an improvement was made around the temporal-difference (TD) algorithm based on function approximation.The weight update method was enhanced using both the least squares method and gradient descent, resulting in the proposed weights gradient descent (WGD) method.The least squares were used to calculate the weights, combining the ideas of TD and gradient descent to find the error between the weights.And this error was used to directly update the weights.By this method, the weights were updated in a new manner, effectively reducing the consumption of computing resources by the algorithm enhancing other gradient descent-based function approximation algorithms.The WGD method is widely applicable in various gradient descent-based reinforcement learning algorithms.The results show that WGD method can adjust parameters within a wider space, effectively reducing the possibility of algorithm divergence.Additionally, it achieves better performance while improving the convergence speed of the algorithm.

Key words: function approximation, reinforcement learning, gradient descent, least-squares, weights gradient descent

CLC Number: 

No Suggested Reading articles found!