LGAIMLMay 16, 2024

The Real Price of Bandit Information in Multiclass Classification

arXiv:2405.10027v27 citationsh-index: 31COLT
Originality Incremental advance
AI Analysis

This work addresses the efficiency of learning in bandit multiclass classification, providing tighter regret bounds that are incremental but important for scenarios with limited feedback.

The paper tackles the problem of multiclass classification with bandit feedback, showing that the minimax regret is more nuanced than previously thought, with a new algorithm achieving regret O(|H|+√T) and matching lower bounds, improving over classical algorithms for moderately-sized hypothesis classes.

We revisit the classical problem of multiclass classification with bandit feedback (Kakade, Shalev-Shwartz and Tewari, 2008), where each input classifies to one of $K$ possible labels and feedback is restricted to whether the predicted label is correct or not. Our primary inquiry is with regard to the dependency on the number of labels $K$, and whether $T$-step regret bounds in this setting can be improved beyond the $\smash{\sqrt{KT}}$ dependence exhibited by existing algorithms. Our main contribution is in showing that the minimax regret of bandit multiclass is in fact more nuanced, and is of the form $\smash{\widetildeΘ\left(\min \left\{|H| + \sqrt{T}, \sqrt{KT \log |H|} \right\} \right) }$, where $H$ is the underlying (finite) hypothesis class. In particular, we present a new bandit classification algorithm that guarantees regret $\smash{\widetilde{O}(|H|+\sqrt{T})}$, improving over classical algorithms for moderately-sized hypothesis classes, and give a matching lower bound establishing tightness of the upper bounds (up to log-factors) in all parameter regimes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes