MLLGOCFeb 17, 2020

Statistically Efficient, Polynomial Time Algorithms for Combinatorial Semi Bandits

arXiv:2002.07258v225 citations
AI Analysis

This work addresses a computational bottleneck in bandit algorithms for large-scale combinatorial settings, enabling practical applications in areas like online advertising or recommendation systems.

The paper tackles the problem of combinatorial semi-bandits with uncorrelated rewards, where existing algorithms like ESCB have exponential computational complexity in large dimensions. The authors propose a new algorithm that achieves the same regret bound of O(d (ln m)^2 (ln T) / Δ_min) but with polynomial-time computational complexity O(T poly(d)), making it both statistically and computationally efficient.

We consider combinatorial semi-bandits over a set of arms ${\cal X} \subset \{0,1\}^d$ where rewards are uncorrelated across items. For this problem, the algorithm ESCB yields the smallest known regret bound $R(T) = {\cal O}\Big( {d (\ln m)^2 (\ln T) \over Δ_{\min} }\Big)$, but it has computational complexity ${\cal O}(|{\cal X}|)$ which is typically exponential in $d$, and cannot be used in large dimensions. We propose the first algorithm which is both computationally and statistically efficient for this problem with regret $R(T) = {\cal O} \Big({d (\ln m)^2 (\ln T)\over Δ_{\min} }\Big)$ and computational complexity ${\cal O}(T {\bf poly}(d))$. Our approach involves carefully designing an approximate version of ESCB with the same regret guarantees, showing that this approximate algorithm can be implemented in time ${\cal O}(T {\bf poly}(d))$ by repeatedly maximizing a linear function over ${\cal X}$ subject to a linear budget constraint, and showing how to solve this maximization problems efficiently.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes