LGOCMLFeb 1, 2019

An Information-Theoretic Approach to Minimax Regret in Partial Monitoring

arXiv:1902.00470v272 citations
Originality Highly original
AI Analysis

This work addresses theoretical challenges in online learning and decision-making under partial information, providing incremental improvements in regret bounds for specific settings.

The paper tackles the problem of deriving minimax regret bounds in partial monitoring settings without assumptions on adversary signals or decisions, achieving new regret bounds independent of large game-dependent constants and improving the minimax regret for k-armed adversarial bandits by a factor of 2.

We prove a new minimax theorem connecting the worst-case Bayesian regret and minimax regret under partial monitoring with no assumptions on the space of signals or decisions of the adversary. We then generalise the information-theoretic tools of Russo and Van Roy (2016) for proving Bayesian regret bounds and combine them with the minimax theorem to derive minimax regret bounds for various partial monitoring settings. The highlight is a clean analysis of `non-degenerate easy' and `hard' finite partial monitoring, with new regret bounds that are independent of arbitrarily large game-dependent constants. The power of the generalised machinery is further demonstrated by proving that the minimax regret for k-armed adversarial bandits is at most sqrt{2kn}, improving on existing results by a factor of 2. Finally, we provide a simple analysis of the cops and robbers game, also improving best known constants.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes