Bayesian Q learning method with Dyna architecture and prioritized sweeping
于俊1,刘全1,2,傅启明1,孙洪坤1,陈桂兴1
Journal on Communications . 2013, (11): 15 -139 .