网络与信息安全学报 ›› 2023, Vol. 9 ›› Issue (4): 16-28.doi: 10.11959/j.issn.2096-109x.2023050

• 学术论文 • 上一篇    

强化学习中基于权重梯度下降的函数逼近方法

秦晓燕1, 刘禹含2, 徐云龙3, 李斌4   

  1. 1 苏州高博软件技术职业学院信息与软件学院,江苏 苏州 215163
    2 滑铁卢大学,安大略 滑铁卢 N2L3G4
    3 苏州大学应用技术学院,江苏 苏州 215325
    4 苏州大学计算机科学与技术学院,江苏 苏州 215325
  • 修回日期:2023-05-30 出版日期:2023-08-01 发布日期:2023-08-01
  • 作者简介:秦晓燕(1984- ),女,江苏泰州人,苏州高博软件技术职业学院副教授,主要研究方向为软件工程、人工智能
    刘禹含(2000- ),女,黑龙江大庆人,加拿大滑铁卢大学硕士生,主要研究方向为数字媒体技术
    徐云龙(1964- ),男,江苏苏州人,苏州大学副教授,主要研究方向为机器学习、操作系统
    李斌(1994- ),男,江苏镇江人,主要研究方向为强化学习
  • 基金资助:
    国家自然科学基金(61772355);国家自然科学基金(61702055);国家自然科学基金(61876217);国家自然科学基金(62176175);江苏省高等学校自然科学研究重大项目(18KJA520011);江苏省高等学校自然科学研究重大项目(17KJA520004);苏州市应用基础研究计划工业部分(SYG201422);江苏省高职院校教师专业带头人高端研修项目(2021GRFX052);江苏高校优势学科建设工程资助项目;江苏省职业教育软件技术“双师型”名师工作室资助项目

Function approximation method based on weights gradient descent in reinforcement learning

Xiaoyan QIN1, Yuhan LIU2, Yunlong XU3, Bin LI4   

  1. 1 School of Information and Software, Global Institute of Software Technology, Suzhou 215163, China
    2 University of Waterloo, Waterloo, N2L3G4, Canada
    3 Applied Technology College, Soochow University, Suzhou 215325, China
    4 School of Computer Science and Technology, Soochow University, Suzhou 215325, China
  • Revised:2023-05-30 Online:2023-08-01 Published:2023-08-01
  • Supported by:
    The National Natural Science Foundation of China(61772355);The National Natural Science Foundation of China(61702055);The National Natural Science Foundation of China(61876217);The National Natural Science Foundation of China(62176175);Jiangsu Province Natural Science Research University Major Projects(18KJA520011);Jiangsu Province Natural Science Research University Major Projects(17KJA520004);Suzhou Industrial Application of Basic Research Program Part(SYG201422);Jiangsu Province High End Research and Training Project for Professional Leaders of Teachers in Vocational Colleges(2021GRFX052);Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions;Project Funded byJiangsu Province Vocational Education the Double-Qualified of Teaching Stu-dio in Software Technology

摘要:

函数逼近法(function approximation)是强化学习领域中的一个研究热点,可以有效处理强化学习中大规模、连续状态和动作空间的问题。基于梯度下降(gradient descent)的函数逼近方法虽然是强化学习中使用最广泛的方法之一,但该算法对步长参数的要求较高,取值不当易产生收敛速度慢、收敛不稳定甚至发散的情况。针对这类问题,通过围绕基于函数逼近的TD(TD,temporal difference)算法,在最小二乘方法和梯度下降方法的基础上对权重的更新方法进行了改进,利用最小二乘方法处理值函数求解权重值,并结合时序差分和梯度下降的思想求出权重之间的误差,并利用该误差直接更新权重,从而提出一种权重梯度下降(WGD,weight gradient descent)方法。该方法以全新的方式更新权重,有效降低算法对计算资源的消耗,并且可以有效地对其他基于梯度下降的函数逼近算法进行改进,广泛应用于诸多基于梯度下降的强化学习算法。实验表明,WGD 方法能够在更广泛的空间中调整参数,可以有效降低算法发散的可能性,在保证算法拥有良好收敛效果的同时,提高算法的收敛速度。

关键词: 函数逼近, 强化学习, 梯度下降, 最小二乘, 权重梯度下降

Abstract:

Function approximation has gained significant attention in reinforcement learning research as it effectively addresses problems with large-scale, continuous state, and action space.Although the function approximation algorithm based on gradient descent method is one of the most widely used methods in reinforcement learning, it requires careful tuning of the step size parameter as an inappropriate value can lead to slow convergence, unstable convergence, or even divergence.To address these issues, an improvement was made around the temporal-difference (TD) algorithm based on function approximation.The weight update method was enhanced using both the least squares method and gradient descent, resulting in the proposed weights gradient descent (WGD) method.The least squares were used to calculate the weights, combining the ideas of TD and gradient descent to find the error between the weights.And this error was used to directly update the weights.By this method, the weights were updated in a new manner, effectively reducing the consumption of computing resources by the algorithm enhancing other gradient descent-based function approximation algorithms.The WGD method is widely applicable in various gradient descent-based reinforcement learning algorithms.The results show that WGD method can adjust parameters within a wider space, effectively reducing the possibility of algorithm divergence.Additionally, it achieves better performance while improving the convergence speed of the algorithm.

Key words: function approximation, reinforcement learning, gradient descent, least-squares, weights gradient descent

中图分类号: 

No Suggested Reading articles found!