LG AIFeb 17, 2022

BADDr: Bayes-Adaptive Deep Dropout RL for POMDPs

Sammie Katt, Hai Nguyen, Frans A. Oliehoek, Christopher Amato

arXiv:2202.08884v13.32 citations

Originality Incremental advance

AI Analysis

This work addresses the scalability problem in Bayesian reinforcement learning for researchers and practitioners dealing with partially observable environments, representing an incremental improvement over previous BRL frameworks.

The paper tackles the challenge of scaling Bayesian reinforcement learning (BRL) under partial observability by proposing a representation-agnostic formulation and a novel method called BADDr, which uses dropout networks to make belief inference more scalable, achieving competitive results with state-of-the-art BRL methods on small domains and solving larger ones.

While reinforcement learning (RL) has made great advances in scalability, exploration and partial observability are still active research topics. In contrast, Bayesian RL (BRL) provides a principled answer to both state estimation and the exploration-exploitation trade-off, but struggles to scale. To tackle this challenge, BRL frameworks with various prior assumptions have been proposed, with varied success. This work presents a representation-agnostic formulation of BRL under partially observability, unifying the previous models under one theoretical umbrella. To demonstrate its practical significance we also propose a novel derivation, Bayes-Adaptive Deep Dropout rl (BADDr), based on dropout networks. Under this parameterization, in contrast to previous work, the belief over the state and dynamics is a more scalable inference problem. We choose actions through Monte-Carlo tree search and empirically show that our method is competitive with state-of-the-art BRL methods on small domains while being able to solve much larger ones.

View on arXiv PDF

Similar