LGAIFeb 17, 2022

BADDr: Bayes-Adaptive Deep Dropout RL for POMDPs

arXiv:2202.08884v12 citations
Originality Incremental advance
AI Analysis

This work addresses the scalability problem in Bayesian reinforcement learning for researchers and practitioners dealing with partially observable environments, representing an incremental improvement over previous BRL frameworks.

The paper tackles the challenge of scaling Bayesian reinforcement learning (BRL) under partial observability by proposing a representation-agnostic formulation and a novel method called BADDr, which uses dropout networks to make belief inference more scalable, achieving competitive results with state-of-the-art BRL methods on small domains and solving larger ones.

While reinforcement learning (RL) has made great advances in scalability, exploration and partial observability are still active research topics. In contrast, Bayesian RL (BRL) provides a principled answer to both state estimation and the exploration-exploitation trade-off, but struggles to scale. To tackle this challenge, BRL frameworks with various prior assumptions have been proposed, with varied success. This work presents a representation-agnostic formulation of BRL under partially observability, unifying the previous models under one theoretical umbrella. To demonstrate its practical significance we also propose a novel derivation, Bayes-Adaptive Deep Dropout rl (BADDr), based on dropout networks. Under this parameterization, in contrast to previous work, the belief over the state and dynamics is a more scalable inference problem. We choose actions through Monte-Carlo tree search and empirically show that our method is competitive with state-of-the-art BRL methods on small domains while being able to solve much larger ones.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes