[1] |
SUTTON R S , BARTO A G . Reinforcement learning:an introduction[J]. IEEE Transactions on Neural Networks, 1998,9(5): 1054-1054.
|
[2] |
陈兴国, 俞扬 . 强化学习及其在电脑围棋中的应用[J]. 自动化学报, 2016,42(5): 685-695.
|
|
CHEN X G , YU Y . Reinforcement learning and its application to the game of GO[J]. Acta Automatica Sinica, 2016,42(5): 685-695.
|
[3] |
HU Y J , GAO Y , AN B . Multiagent reinforcement learning with unshared value functions[J]. IEEE Transactions on Cybernetics, 2015,45(4): 647-662.
|
[4] |
POLYDOROS A S , NALPANTIDIS L . Survey of model-based reinforcement learning:applications on robotics[J]. Journal of Intelligent & Robotic Systems, 2017,86(2): 1-21.
|
[5] |
SSENGONZI C , KOGEDA O P , OLWAL T O . A survey of deep reinforcement learning application in 5G and beyond network slicing and virtualization[R]. 2022.
|
[6] |
WU Y , MOZIFIAN M , SHKURTI F . Shaping rewards for reinforcement learning with imperfect demonstrations using generative models[C]// 2021 IEEE International Conference on Robotics and Automation (ICRA). 2021: 6628-6634.
|
[7] |
XU XIN , ZUO LEI , HUANG Z H . Reinforcement learning algorithms with function approximation:Recent advances and applications[J]. Information Sciences, 2014,261(5): 1-31.
|
[8] |
GRONDMAN I , BUSONIU L , LOPES G A D ,et al. A survey of actor-critic reinforcement learning:standard and natural policy gradients[J]. IEEE Transactions on Systems,Man,and Cybernetics,Part C (Applications and Reviews), 2012,42(6): 1291-1307.
|
[9] |
VAN-SEIJEN H , MAHMOOD A R , PILARSKI P M ,et al. True online temporal-difference learning[J]. Journal of Machine Learning Research, 2015,17(1): 5057-5096.
|
[10] |
LI K , BURDICK J W . A function approximation method for model-based high-dimensional inverse reinforcement learning[R]. 2017.
|
[11] |
FUJIMOTO S , GU S S . A minimalist approach to offline reinforcement learning[C]// Advances in Neural Information Processing Systems, 2021,34: 20132-20145.
|
[12] |
THOMAS P S , BRUNSKILL E . Policy gradient methods for reinforcement learning with function approximation and action-dependent baselines[R]. 2017.
|
[13] |
MNIH V , KAVUKCUOGLU K , SILVER D ,et al. Human-level control through deep reinforcement learning[J]. Nature, 2015,518(7540): 529-533.
|
[14] |
郭潇逍, 李程, 梅俏竹 . 深度学习在游戏中的应用[J]. 自动化学报, 2016,42(5): 676-684.
|
|
GUO X X , LI C , MEI Q Z . Deep learning applied to games[J]. Acta Automatica Sinica, 2016,42(5): 676-684.
|
[15] |
GEIST M , PIETQUIN O . Parametric value function approximation:a unified view[C]// Proceedings of the 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning. 2011.
|
[16] |
FAIRBANK M , ALONSO E . The divergence of reinforcement learning algorithms with value-iteration and function approximation[C]// International Joint Conference on Neural Networks. 2012.
|
[17] |
BHANDARI J , RUSSO D , SINGAL R . A finite time analysis of temporal difference learning with linear function approximation[R]. 2018.
|
[18] |
AWHEDA M D , SCHWARTZ H M . A residual gradient fuzzy reinforcement learning algorithm for differential games[J]. International Journal of Fuzzy Systems, 2017.
|
[19] |
BO L , JI L , GHAVAMZADEH M ,et al. Proximal gradient temporal difference learning algorithms[C]// International Joint Conference on Artificial Intelligence. 2016.
|