通信学报 ›› 2014, Vol. 35 ›› Issue (8): 56-62.doi: 10.3969/j.issn.1000-436x.2014.08.008

• 学术论文 • 上一篇    下一篇

基于拓扑序列更新的值迭代算法

黄蔚1,刘全1,2,孙洪坤1,傅启明1,周小科1   

  1. 1 苏州大学 计算机科学与技术学院,江苏 苏州 215006
    2 吉林大学 符号计算与知识工程教育部重点实验室,吉林 长春 130012
  • 出版日期:2014-08-25 发布日期:2017-06-29
  • 基金资助:
    国家自然科学基金资助项目;国家自然科学基金资助项目;国家自然科学基金资助项目;国家自然科学基金资助项目;江苏省自然科学基金资助项目;江苏省高校自然科学研究基金资助项目;江苏省高校自然科学研究基金资助项目;吉林大学符号计算与知识工程教育部重点实验室基金资助项目

Optimized algorithm for value iteration based on topological sequence backups

Wei HUANG1,Quan LIU1,2,Hong-kun SUN1,Qi-ming FU1,HOUXiao-ke Z1   

  1. 1 School of Computer Science and Technology, Soochow University, Suzhou 215006, China
    2 Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
  • Online:2014-08-25 Published:2017-06-29
  • Supported by:
    The National Natural Science Foundation of China;The National Natural Science Foundation of China;The National Natural Science Foundation of China;The National Natural Science Foundation of China;The Natural Science Foundation of Jiangsu Province;High School Natural Foundation of Jiangsu Province;High School Natural Foundation of Jiangsu Province;Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin Univer-sity

摘要:

提出一种基于拓扑序列更新的值迭代算法,利用状态之间的迁移关联信息,将任务模型的有向图分解为一系列规模较小的强连通分量,并依据拓扑序列对强连通分量进行更新。在经典规划问题Mountain Car和迷宫实验中的结果表明,算法的收敛速度更快,精度更高,且对状态空间的增长有较强的顽健性。

关键词: 强化学习, 值迭代, 拓扑序列, VI-TS

Abstract:

In order to improve the convergence performance, an optimized value iteration based on topological sequence backups, VI-TS, is proposed. The key idea of VI-TS is to circumvent the problem of unnecessary backups by dividing an MDP into strongly-connected components and solving these components in topological sequences after detecting the structure of MDP. The experiment results show that VI-TS has a better convergence performance and robustness for state space growth when applied to classical planning experiment scenarios.

Key words: reinforcement learning, value iteration, topological sequence, VI-TS

No Suggested Reading articles found!