MLAILGNov 21, 2019

Information-Theoretic Confidence Bounds for Reinforcement Learning

arXiv:1911.09724v166 citations
Originality Incremental advance
AI Analysis

This work addresses the exploration-exploitation tradeoff in reinforcement learning for algorithm designers, offering a novel theoretical framework that is incremental in extending existing methods.

The paper tackles the problem of designing and analyzing reinforcement learning algorithms by integrating information-theoretic concepts to relate agent performance to information gain, explicitly characterizing the exploration-exploitation tradeoff. It demonstrates applicability in environments like linear bandits, tabular MDPs, and factored MDPs, showing potential for a general approach.

We integrate information-theoretic concepts into the design and analysis of optimistic algorithms and Thompson sampling. By making a connection between information-theoretic quantities and confidence bounds, we obtain results that relate the per-period performance of the agent with its information gain about the environment, thus explicitly characterizing the exploration-exploitation tradeoff. The resulting cumulative regret bound depends on the agent's uncertainty over the environment and quantifies the value of prior information. We show applicability of this approach to several environments, including linear bandits, tabular MDPs, and factored MDPs. These examples demonstrate the potential of a general information-theoretic approach for the design and analysis of reinforcement learning algorithms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes