LGFeb 21

VariBASed: Variational Bayes-Adaptive Sequential Monte-Carlo Planning for Deep Reinforcement Learning

Joery A. de Vries, Jinke He, Yaniv Oren, Pascal R. van der Vaart, Mathijs M. de Weerdt, Matthijs T. J. Spaan

arXiv:2602.18857v11.4h-index: 2

Originality Incremental advance

AI Analysis

This work addresses the challenge of costly training for Bayes-optimal agents in reinforcement learning, offering a more efficient solution for researchers and practitioners, though it appears incremental as it builds on existing methods.

The paper tackled the problem of efficiently balancing exploration and exploitation in reinforcement learning by proposing VariBASeD, a variational framework that combines belief learning, Monte-Carlo planning, and meta-reinforcement learning, resulting in improved sample- and runtime-efficiency in a single-GPU setup.

Optimally trading-off exploration and exploitation is the holy grail of reinforcement learning as it promises maximal data-efficiency for solving any task. Bayes-optimal agents achieve this, but obtaining the belief-state and performing planning are both typically intractable. Although deep learning methods can greatly help in scaling this computation, existing methods are still costly to train. To accelerate this, this paper proposes a variational framework for learning and planning in Bayes-adaptive Markov decision processes that coalesces variational belief learning, sequential Monte-Carlo planning, and meta-reinforcement learning. In a single-GPU setup, our new method VariBASeD exhibits favorable scaling to larger planning budgets, improving sample- and runtime-efficiency over prior methods.

View on arXiv PDF

Similar