LG MLMay 14, 2021

Thompson Sampling for Gaussian Entropic Risk Bandits

Ming Liang Ang, Eloise Y. Y. Lim, Joel Q. L. Chang

arXiv:2105.06960v14.42 citations

Originality Synthesis-oriented

AI Analysis

This work addresses risk-aware decision-making in bandits, which is incremental as it applies an existing method to a specific risk formulation.

The paper tackles the multi-armed bandit problem under an entropic risk measure, providing regret bounds for a Thompson sampling-based algorithm and corresponding lower bounds.

The multi-armed bandit (MAB) problem is a ubiquitous decision-making problem that exemplifies exploration-exploitation tradeoff. Standard formulations exclude risk in decision making. Risknotably complicates the basic reward-maximising objectives, in part because there is no universally agreed definition of it. In this paper, we consider an entropic risk (ER) measure and explore the performance of a Thompson sampling-based algorithm ERTS under this risk measure by providing regret bounds for ERTS and corresponding instance dependent lower bounds.

View on arXiv PDF

Similar