Methods for computing state similarity in Markov Decision Processes
This work addresses a practical bottleneck in MDP state aggregation for researchers and practitioners, though it is incremental as it builds on existing bisimulation-based methods.
The paper tackles the computational cost of using the Kantorovich metric for state similarity in Markov Decision Processes by applying network optimization and statistical sampling techniques, resulting in new distance functions with varying trade-offs in time, space, and aggregation quality, as evaluated empirically.
A popular approach to solving large probabilistic systems relies on aggregating states based on a measure of similarity. Many approaches in the literature are heuristic. A number of recent methods rely instead on metrics based on the notion of bisimulation, or behavioral equivalence between states (Givan et al, 2001, 2003; Ferns et al, 2004). An integral component of such metrics is the Kantorovich metric between probability distributions. However, while this metric enables many satisfying theoretical properties, it is costly to compute in practice. In this paper, we use techniques from network optimization and statistical sampling to overcome this problem. We obtain in this manner a variety of distance functions for MDP state aggregation, which differ in the tradeoff between time and space complexity, as well as the quality of the aggregation. We provide an empirical evaluation of these trade-offs.