LGJan 9, 2013

Risk-Aversion in Multi-armed Bandits

arXiv:1301.1936v138 citations
Originality Incremental advance
AI Analysis

This addresses the need for risk-aware decision-making in practical applications where maximizing expected reward is insufficient, though it appears incremental as it builds on standard bandit frameworks.

The paper tackles the problem of risk-aversion in multi-armed bandits by introducing a novel setting that competes against the arm with the best risk-return trade-off, using variance as a risk measure, and reports preliminary empirical results.

Stochastic multi-armed bandits solve the Exploration-Exploitation dilemma and ultimately maximize the expected reward. Nonetheless, in many practical problems, maximizing the expected reward is not the most desirable objective. In this paper, we introduce a novel setting based on the principle of risk-aversion where the objective is to compete against the arm with the best risk-return trade-off. This setting proves to be intrinsically more difficult than the standard multi-arm bandit setting due in part to an exploration risk which introduces a regret associated to the variability of an algorithm. Using variance as a measure of risk, we introduce two new algorithms, investigate their theoretical guarantees, and report preliminary empirical results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes