LGDSMar 6, 2025

Greedy Algorithm for Structured Bandits: A Sharp Characterization of Asymptotic Success / Failure

arXiv:2503.04010v33 citationsh-index: 38
Originality Incremental advance
AI Analysis

This provides a foundational characterization for bandit algorithms, applicable to contextual bandits and interactive decision-making, though it is incremental in extending prior work to arbitrary finite reward structures.

The paper tackles the problem of determining when the greedy algorithm asymptotically succeeds or fails in structured bandit problems, showing that a partial identifiability property is necessary and sufficient for sublinear regret, and that this condition makes the problem easy for any non-degenerate algorithm.

We study the greedy (exploitation-only) algorithm in bandit problems with a known reward structure. We allow arbitrary finite reward structures, while prior work focused on a few specific ones. We fully characterize when the greedy algorithm asymptotically succeeds or fails, in the sense of sublinear vs. linear regret as a function of time. Our characterization identifies a partial identifiability property of the problem instance as the necessary and sufficient condition for the asymptotic success. Notably, once this property holds, the problem becomes easy -- any algorithm will succeed (in the same sense as above), provided it satisfies a mild non-degeneracy condition. Our characterization extends to contextual bandits and interactive decision-making with arbitrary feedback. Examples demonstrating broad applicability and extensions to infinite reward structures are provided.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes