基于两层模糊划分的在策略时间差分算法

TD algorithm based on double-layer fuzzy partitioning

Online:2013-10-25 Published:2013-10-15

Abstract

Abstract: When dealing with the continuous space problems, the traditional Q-iteration algorithms based on lookup-table or function approximation converge slowly and are difficult to get a continuous policy. To overcome the above weaknesses, an on-policy TD algorithm named DFP-OPTD was proposed based on double-layer fuzzy partitioning and its convergence was proved. The first layer of fuzzy partitioning was applied for state space, the second layer of fuzzy partitioning was applied for action space, and Q-value functions were computed by the combination of the two layer fuzzy partitioning. Based on the Q-value function, the consequent parameters of fuzzy rules were updated by gradient descent method. Applying DFP-OPTD on two classical reinforcement learning problems, experimental results show that the algorithm not only can be used to get a continuous action policy, but also has a better convergence performance.

TD algorithm based on double-layer fuzzy partitioning

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 0

Metrics

Recommended 0