LLM-Guided Probabilistic Program Induction for POMDP Model Estimation
This addresses model estimation for POMDPs in robotics and decision-making under uncertainty, but it is incremental as it builds on existing LLM and probabilistic program methods.
The paper tackles the problem of learning POMDP models by using an LLM to generate and refine low-complexity probabilistic programs, showing it is more effective than tabular learning, behavior cloning, or direct LLM planning on toy, simulated, and real robotics domains.
Partially Observable Markov Decision Processes (POMDPs) model decision making under uncertainty. While there are many approaches to approximately solving POMDPs, we aim to address the problem of learning such models. In particular, we are interested in a subclass of POMDPs wherein the components of the model, including the observation function, reward function, transition function, and initial state distribution function, can be modeled as low-complexity probabilistic graphical models in the form of a short probabilistic program. Our strategy to learn these programs uses an LLM as a prior, generating candidate probabilistic programs that are then tested against the empirical distribution and adjusted through feedback. We experiment on a number of classical toy POMDP problems, simulated MiniGrid domains, and two real mobile-base robotics search domains involving partial observability. Our results show that using an LLM to guide in the construction of a low-complexity POMDP model can be more effective than tabular POMDP learning, behavior cloning, or direct LLM planning.