LG AIJun 14, 2023

Near-Optimal Partially Observable Reinforcement Learning with Partial Online State Information

arXiv:2306.08762v45.33 citationsh-index: 74

Originality Highly original

AI Analysis

This work addresses the challenge of efficient reinforcement learning under sensing constraints for applications like robotics and autonomous systems, offering a principled separation between tractable and intractable regimes.

The paper tackles the problem of learning in partially observable Markov decision processes (POMDPs) with limited online state information, proving that achieving near-optimal policies can require exponential sample complexity in general but identifying structured subclasses where efficient learning is possible with regret bounds of $ ilde{O}(\sqrt{K})$.

Partially observable Markov decision processes (POMDPs) are a general framework for sequential decision-making under latent state uncertainty, yet learning in POMDPs is intractable in the worst case. Motivated by sensing and probing constraints in practice, we study how much online state information (OSI) is sufficient to enable efficient learning guarantees. We formalize a model in which the learner can query only partial OSI (POSI) during interaction. We first prove an information-theoretic hardness result showing that, for general POMDPs, achieving an $ε$-optimal policy can require sample complexity that is exponential unless full OSI is available. We then identify two structured subclasses that remain learnable under POSI and propose corresponding algorithms with provably efficient performance guarantees. In particular, we establish regret upper bounds with $\tilde{O}(\sqrt{K})$ dependence on the number of episodes $K$, together with complementary lower bounds, thereby delineating when POSI suffices for efficient reinforcement learning. Our results highlight a principled separation between intractable and tractable regimes under incomplete online state access and provide new tools for jointly optimizing POSI queries and learning control actions.

View on arXiv PDF

Similar