Metrics for Markov Decision Processes with Infinite State Spaces
This work addresses a foundational challenge in reinforcement learning for continuous or large-scale MDPs, though it appears incremental as it extends existing bisimulation concepts to infinite state spaces.
The authors tackled the problem of measuring state similarity in Markov decision processes (MDPs) with infinite state spaces, and showed that the optimal value function varies continuously with respect to their proposed metric distances.
We present metrics for measuring state similarity in Markov decision processes (MDPs) with infinitely many states, including MDPs with continuous state spaces. Such metrics provide a stable quantitative analogue of the notion of bisimulation for MDPs, and are suitable for use in MDP approximation. We show that the optimal value function associated with a discounted infinite horizon planning task varies continuously with respect to our metric distances.