LGNov 11, 2014

Bounded Regret for Finite-Armed Structured Bandits

arXiv:1411.2919v170 citations
Originality Incremental advance
AI Analysis

This addresses a structured bandit problem for machine learning and decision-making, offering theoretical guarantees but appearing incremental in nature.

The authors tackled the problem of K-armed bandits where arm returns are interdependent, presenting a new algorithm that achieves finite expected cumulative regret under certain conditions, with problem-dependent lower bounds showing near-optimality in special cases.

We study a new type of K-armed bandit problem where the expected return of one arm may depend on the returns of other arms. We present a new algorithm for this general class of problems and show that under certain circumstances it is possible to achieve finite expected cumulative regret. We also give problem-dependent lower bounds on the cumulative regret showing that at least in special cases the new algorithm is nearly optimal.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes