LGMLJul 18, 2022

An Information-Theoretic Analysis of Bayesian Reinforcement Learning

arXiv:2207.08735v12 citationsh-index: 44
Originality Synthesis-oriented
AI Analysis

This work provides theoretical insights into the fundamental limits of Bayesian reinforcement learning, which is incremental as it builds on existing supervised learning frameworks.

The paper extends an information-theoretic framework to analyze Bayesian reinforcement learning, defining minimum Bayesian regret (MBR) and deriving upper bounds using metrics like relative entropy and Wasserstein distance, with applications to multi-armed bandits and online optimization problems.

Building on the framework introduced by Xu and Raginksy [1] for supervised learning problems, we study the best achievable performance for model-based Bayesian reinforcement learning problems. With this purpose, we define minimum Bayesian regret (MBR) as the difference between the maximum expected cumulative reward obtainable either by learning from the collected data or by knowing the environment and its dynamics. We specialize this definition to reinforcement learning problems modeled as Markov decision processes (MDPs) whose kernel parameters are unknown to the agent and whose uncertainty is expressed by a prior distribution. One method for deriving upper bounds on the MBR is presented and specific bounds based on the relative entropy and the Wasserstein distance are given. We then focus on two particular cases of MDPs, the multi-armed bandit problem (MAB) and the online optimization with partial feedback problem. For the latter problem, we show that our bounds can recover from below the current information-theoretic bounds by Russo and Van Roy [2].

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes