LGMLMay 15, 2017

Bandit Regret Scaling with the Effective Loss Range

arXiv:1705.05091v39 citations
Originality Incremental advance
AI Analysis

This work addresses a fundamental challenge in bandit optimization for scenarios with small effective loss ranges, offering potential improvements in regret bounds for applications like online learning and decision-making, though it is incremental as it builds on existing bandit frameworks with specific assumptions.

The paper tackles the problem of improving regret guarantees in nonstochastic multi-armed bandits by leveraging the effective loss range, showing that under mild assumptions like rough loss estimates or advance knowledge of a single arm's loss, it is possible to achieve regret scaling with this effective range despite prior impossibility results.

We study how the regret guarantees of nonstochastic multi-armed bandits can be improved, if the effective range of the losses in each round is small (e.g. the maximal difference between two losses in a given round). Despite a recent impossibility result, we show how this can be made possible under certain mild additional assumptions, such as availability of rough estimates of the losses, or advance knowledge of the loss of a single, possibly unspecified arm. Along the way, we develop a novel technique which might be of independent interest, to convert any multi-armed bandit algorithm with regret depending on the loss range, to an algorithm with regret depending only on the effective range, while avoiding predictably bad arms altogether.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes