AIJul 11, 2012

Metrics for Finite Markov Decision Processes

Norman Ferns, Prakash Panangaden, Doina Precup

arXiv:1207.4114v1364 citations

Originality Synthesis-oriented

AI Analysis

This work addresses state aggregation and value function approximation in reinforcement learning, but appears incremental as it builds on existing bisimulation concepts.

The paper tackles the problem of measuring state similarity in finite Markov decision processes by introducing metrics based on bisimulation, aiming to improve discounted infinite horizon reinforcement learning tasks, with results including bounds relating metric distances to optimal state values.

We present metrics for measuring the similarity of states in a finite Markov decision process (MDP). The formulation of our metrics is based on the notion of bisimulation for MDPs, with an aim towards solving discounted infinite horizon reinforcement learning tasks. Such metrics can be used to aggregate states, as well as to better structure other value function approximators (e.g., memory-based or nearest-neighbor approximators). We provide bounds that relate our metric distances to the optimal values of states in the given MDP.

View on arXiv PDF

Similar