AIJul 11, 2012

Metrics for Finite Markov Decision Processes

arXiv:1207.4114v1364 citations
Originality Synthesis-oriented
AI Analysis

This work addresses state aggregation and value function approximation in reinforcement learning, but appears incremental as it builds on existing bisimulation concepts.

The paper tackles the problem of measuring state similarity in finite Markov decision processes by introducing metrics based on bisimulation, aiming to improve discounted infinite horizon reinforcement learning tasks, with results including bounds relating metric distances to optimal state values.

We present metrics for measuring the similarity of states in a finite Markov decision process (MDP). The formulation of our metrics is based on the notion of bisimulation for MDPs, with an aim towards solving discounted infinite horizon reinforcement learning tasks. Such metrics can be used to aggregate states, as well as to better structure other value function approximators (e.g., memory-based or nearest-neighbor approximators). We provide bounds that relate our metric distances to the optimal values of states in the given MDP.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes