SYLGSep 24, 2024

Agent-state based policies in POMDPs: Beyond belief-state MDPs

arXiv:2409.15703v113 citationsh-index: 27
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of learning in partially observable environments for AI and robotics researchers, but it is incremental as it synthesizes existing approaches rather than introducing a new paradigm.

The paper tackles the limitation of belief-state MDPs in POMDPs, which require perfect system knowledge and are inapplicable in learning settings, by presenting a unified framework of agent-state based policies and reviewing methods to find good policies, including applications to improve Q-learning and actor-critic algorithms.

The traditional approach to POMDPs is to convert them into fully observed MDPs by considering a belief state as an information state. However, a belief-state based approach requires perfect knowledge of the system dynamics and is therefore not applicable in the learning setting where the system model is unknown. Various approaches to circumvent this limitation have been proposed in the literature. We present a unified treatment of some of these approaches by viewing them as models where the agent maintains a local recursively updateable agent state and chooses actions based on the agent state. We highlight the different classes of agent-state based policies and the various approaches that have been proposed in the literature to find good policies within each class. These include the designer's approach to find optimal non-stationary agent-state based policies, policy search approaches to find a locally optimal stationary agent-state based policies, and the approximate information state to find approximately optimal stationary agent-state based policies. We then present how ideas from the approximate information state approach have been used to improve Q-learning and actor-critic algorithms for learning in POMDPs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes