LGITMLAug 25, 2021

A Unifying Theory of Thompson Sampling for Continuous Risk-Averse Bandits

arXiv:2108.11345v418 citations
AI Analysis

This work addresses the problem of risk management in decision-making under uncertainty for researchers and practitioners in bandit algorithms, providing a unifying theoretical framework.

The paper tackles the design and analysis of risk-averse Thompson sampling algorithms for multi-armed bandits, proving asymptotically optimal regret bounds for various risk measures like CVaR and mean-variance, with numerical simulations showing tight regret bounds.

This paper unifies the design and the analysis of risk-averse Thompson sampling algorithms for the multi-armed bandit problem for a class of risk functionals $ρ$ that are continuous and dominant. We prove generalised concentration bounds for these continuous and dominant risk functionals and show that a wide class of popular risk functionals belong to this class. Using our newly developed analytical toolkits, we analyse the algorithm $ρ$-MTS (for multinomial distributions) and prove that they admit asymptotically optimal regret bounds of risk-averse algorithms under CVaR, proportional hazard, and other ubiquitous risk measures. More generally, we prove the asymptotic optimality of $ρ$-MTS for Bernoulli distributions for a class of risk measures known as empirical distribution performance measures (EDPMs); this includes the well-known mean-variance. Numerical simulations show that the regret bounds incurred by our algorithms are reasonably tight vis-à-vis algorithm-independent lower bounds.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes