LGJan 6, 2014

Exploration vs Exploitation vs Safety: Risk-averse Multi-Armed Bandits

arXiv:1401.1123v1122 citations
Originality Incremental advance
AI Analysis

This work addresses risk management in decision-making for applications like energy management, presenting an incremental improvement over existing risk-aware bandit algorithms.

The paper tackles the problem of balancing exploration, exploitation, and safety in multi-armed bandits by introducing the MARAB algorithm, which uses conditional value at risk to limit risky arm exploration, and shows through theoretical analysis and experiments that it and the MIN algorithm outperform UCB and other risk-aware methods.

Motivated by applications in energy management, this paper presents the Multi-Armed Risk-Aware Bandit (MARAB) algorithm. With the goal of limiting the exploration of risky arms, MARAB takes as arm quality its conditional value at risk. When the user-supplied risk level goes to 0, the arm quality tends toward the essential infimum of the arm distribution density, and MARAB tends toward the MIN multi-armed bandit algorithm, aimed at the arm with maximal minimal value. As a first contribution, this paper presents a theoretical analysis of the MIN algorithm under mild assumptions, establishing its robustness comparatively to UCB. The analysis is supported by extensive experimental validation of MIN and MARAB compared to UCB and state-of-art risk-aware MAB algorithms on artificial and real-world problems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes