[1] |
SUTTON R S , BARTO G A . Reinforcement learning:an introduction[M]. Cambridge: MIT pressPress, 1998.
|
[2] |
朱斐, 刘全, 傅启明 ,等. 一种用于连续动作空间的最小二乘行动者-评论家方法[J]. 计算机研究与发展, 2014,51(3): 548-558.
|
|
ZHU F , LIU Q , FU Q M . A least square actor-critic approach for continuous action space[J]. Journal of Computer Research and Development, 2014,51(3): 548-558.
|
[3] |
孙志军, 薛磊, 许阳明 ,等. 深度学习研究综述[J]. 计算机应用研究, 2012,29(8): 2806-2810.
|
|
SUN Z J , XUE L , XU Y M ,et al. Overview of deep learning[J]. Application Research of Computers, 2012,29(8): 2806-2810.
|
[4] |
LECUN Y , BENGIO Y , HINTON G . Deep learning[J]. Nature, 2015,521(7553): 436-444.
|
[5] |
HINTON G E , OSINDERO S , TEH Y W . A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006,18(7): 1527-1554.
|
[6] |
SILVER D , HUANG A , MADDISON C J ,et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016,529(7587): 484-489.
|
[7] |
SILVER D , SCHRITTWIESER J , SIMONYAN K ,et al. Mastering the game of go without human knowledge[J]. Nature, 2017,550(7676): 354-359.
|
[8] |
MNIH V , KAVUKCUOFLU K , SILVER D ,et al. Playing atari with deep reinforcement learning[C]// Workshops at the 26th Neural Information Processing Systems. 2013.
|
[9] |
MNIH V , KAVUKCUOFLU K , SILVER D ,et al. Human-level control through deep reinforcement learning[J]. Nature, 2015,518(7540): 529-533.
|
[10] |
WATKINS C J C H . Learning from delayed rewards[J]. Robotics and Autonomous Systems, 1989,15(4): 233-235.
|
[11] |
VAN H V , GUEZ A , SILVER D . Deep reinforcement learning with double q-learning[C]// The AAAI Conference on Artificial Intelligence. 2016.
|
[12] |
HASSELT H V , . Double Q-learning[C]// The Advances in Neural Information Processing Systems. 2010.
|
[13] |
SCHAUL T , QUAN J , ANTONOGLOU I ,et al. Prioritized experience replay[C]// The 4th International Conference on Learning Representations. 2016: 322-355.
|
[14] |
SUTTON R S , MCALLESTER D , SINGH S ,et al. Policy gradient methods for reinforcement learning with function approximation[J]. Advances in Neural Information Processing Systems, 2000,12: 1057-1063.
|
[15] |
LILLICRAP T P , HUNT J J , PRITZEL A ,et al. Continuous control with deep reinforcement learning[C]// The 4th International Conference on Learning Representations. 2015.
|
[16] |
SILVER D , LEVER G , HEESS N ,et al. Deterministic policy gradient algorithms[C]// The International Conference on Machine Learning. 2014.
|
[17] |
GIVAN R , DEAN T , GREIG M . Equivalence notions and model minimization in Markov decision processes[J]. Artificial Intelligence, 2003,147(1-2): 163-223.
|
[18] |
FERNS N , PANANGADEN P , PRECUP D . Metrics for finite markov decision processes[C]// The 20th Conference on Uncertainty in Artificial Intelligence. 2004.
|