Metrics for Finite Markov Decision Processes
This work addresses state aggregation and value function approximation in reinforcement learning, but appears incremental as it builds on existing bisimulation concepts.
The paper tackles the problem of measuring state similarity in finite Markov decision processes by introducing metrics based on bisimulation, aiming to improve discounted infinite horizon reinforcement learning tasks, with results including bounds relating metric distances to optimal state values.
We present metrics for measuring the similarity of states in a finite Markov decision process (MDP). The formulation of our metrics is based on the notion of bisimulation for MDPs, with an aim towards solving discounted infinite horizon reinforcement learning tasks. Such metrics can be used to aggregate states, as well as to better structure other value function approximators (e.g., memory-based or nearest-neighbor approximators). We provide bounds that relate our metric distances to the optimal values of states in the given MDP.