LGMLNov 9, 2018

Performance Guarantees for Homomorphisms Beyond Markov Decision Processes

arXiv:1811.03895v19 citations
Originality Incremental advance
AI Analysis

This work addresses scalability issues in reinforcement learning for real-world problems with huge state-action spaces, offering an incremental extension to existing frameworks.

The paper tackles the challenge of applying tabular solution methods to large-scale problems by exploring non-Markovian homomorphisms, showing that near-optimal performance can be guaranteed even without Markovian requirements, allowing for more state aggregation without compromising performance.

Most real-world problems have huge state and/or action spaces. Therefore, a naive application of existing tabular solution methods is not tractable on such problems. Nonetheless, these solution methods are quite useful if an agent has access to a relatively small state-action space homomorphism of the true environment and near-optimal performance is guaranteed by the map. A plethora of research is focused on the case when the homomorphism is a Markovian representation of the underlying process. However, we show that near-optimal performance is sometimes guaranteed even if the homomorphism is non-Markovian. Moreover, we can aggregate significantly more states by lifting the Markovian requirement without compromising on performance. In this work, we expand Extreme State Aggregation (ESA) framework to joint state-action aggregations. We also lift the policy uniformity condition for aggregation in ESA that allows even coarser modeling of the true environment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes