[1] |
SUTTON R S , BARTO A G . Reinforcement Learning: An Introduc-tion[M]. Cambridge: MIT Press 1998.
|
[2] |
徐昕 . 增强学习与近似动态规划[M]. 北京: 科学出版社, 2010. XU X . Reinforcement Learning and Approximate Dynamic Program-ming[M]. Beijing: Science Press, 2010.
|
[3] |
刘全, 傅启明, 龚声蓉 等. 最小状态变元平均奖赏的强化学习方法[J]. 通信学报, 2011,32 (1): 66-71. LIU Q , FU Q M , GONG S R , et al. Reinforcement learning algorithm based on minimum state method and average reward[J]. Journal on Communications, 2011,32 (1): 66-71.
|
[4] |
肖飞, 刘全, 傅启明 等. 基于自适应势函数塑造奖赏机制的梯度下降Sarsa (?) 算法[J]. 通信学报, 2013,34 (1): 77-88. XIAO F , LIU Q , FU Q M , et al. Gradient descent Sarsa(?)algorithm based on the adaptive potential function shaping reward mechanism[J]. Journal on Communications, 2013,34 (1): 77-88.
|
[5] |
SZEPESVáRI C . Algorithms for Reinforcement Learning[M]. San Rafael: Morgan Claypool 2010.
|
[6] |
WATKINS C . Learning From Delayed Rewards[D]. Cambridge: Kings's College, University of Cambridge 1989.
|
[7] |
SUTTON R S . Dyna, an integrated architecture for learning, planning, and reacting[J]. SIGART Bulletin, 1991,2: 160-163.
|
[8] |
SUTTON R S , SZEPESVáRI C , GERAMIFARD A , et al. Dyna-style planning with linear function approximation and prioritized sweep-ing[A]. Proceedings of the 24th Conference on Uncertai y in Artifi-cial Intelligence[C]. Finland: AUAI, 2008.
|
[9] |
WINGATE D , SEPPI K D . Prioritized methods for accelerating MDP solvers[J]. Journal of Machine Learning Research, 2005,6: 851-881.
|
[10] |
MEULEAU N , BOURGINE P . Exploration of multi-state environ-ments: local measures and back-propagation of uncertainty[J]. Ma-chine Learning, 1999,35 (2): 117-154.
|
[11] |
COGGAN M . Exploration and exploitation in reinforcement learn-ing[A]. Proceedings of the 4th International Conference on Computa-tional Intelligence and Multimedia Applications[C]. Japan, 2001.
|
[12] |
ALEXANDER L , STREHL , MICHAEL L . A theoretical analysis of mod-el-based interval estimation[A]. Proceedings of the 22nd International Conference on Machine Learning[C]. New York: ACM, 2005.
|
[13] |
MEULEAU N , BOURGINE P . Exploration of multi-state environ-ments: local measures and back-propagation of uncertainty[J]. Ma-chine Learning, 1999,35 (2): 117-154.
|
[14] |
DEARDEN R , FRIEDMAN N , RUSSELL S . Bayesian Q learning[A]. Proceedings of 15th International Conference on Artifi ial Intelli-gence[C]. Menlo Park: AAAI Press, 1998.
|
[15] |
DEARDEN R , FRIEDMAN N , ANDRE D . Model based Bayesian exploration[A]. Proceedings of 15th Conference on Uncertainty in Ar-tificial Intelligence[C]. San Francisco: Morgan Kaufmann, 1999.
|
[16] |
ASMUTH J , MICHAEL L , et al Potential-based shaping in mod-el-based reinforcement learning[A]. Proceedings of the 23th AAAI Conference on Artificial Intelligence[C]. Chicago: AAAI Press, 2008.
|
[17] |
PENG J , WILLIAMS R J . Efficient learning and planning within the dyna framework[J]. Adaptive Behavior, 1993,2: 437-454.
|
[18] |
DEGROOT M , SCHERVISH M . Probability and Statistics[M]. New York: Person Edition 2010.
|
[19] |
TEACY W , CHALKIADAKIS G , FARINELLI A . Decentralised Bayesian reinforcement learning for online agent collaboration[A]. Proceedings of 11th International Joint Conference on tonomous Agents and Multi-Agent Systems[C]. Spain: IFAAMAS, 2012.
|