ML LG OCFeb 17, 2020

Statistically Efficient, Polynomial Time Algorithms for Combinatorial Semi Bandits

Thibaut Cuvelier, Richard Combes, Eric Gourdin

arXiv:2002.07258v210.925 citationsHas Code

Originality Highly original

AI Analysis

This work addresses a computational bottleneck in bandit algorithms for large-scale combinatorial settings, enabling practical applications in areas like online advertising or recommendation systems.

The paper tackles the problem of combinatorial semi-bandits with uncorrelated rewards, where existing algorithms like ESCB have exponential computational complexity in large dimensions. The authors propose a new algorithm that achieves the same regret bound of O(d (ln m)^2 (ln T) / Δ_min) but with polynomial-time computational complexity O(T poly(d)), making it both statistically and computationally efficient.

We consider combinatorial semi-bandits over a set of arms ${\cal X} \subset \{0,1\}^d$ where rewards are uncorrelated across items. For this problem, the algorithm ESCB yields the smallest known regret bound $R(T) = {\cal O}\Big( {d (\ln m)^2 (\ln T) \over Δ_{\min} }\Big)$, but it has computational complexity ${\cal O}(|{\cal X}|)$ which is typically exponential in $d$, and cannot be used in large dimensions. We propose the first algorithm which is both computationally and statistically efficient for this problem with regret $R(T) = {\cal O} \Big({d (\ln m)^2 (\ln T)\over Δ_{\min} }\Big)$ and computational complexity ${\cal O}(T {\bf poly}(d))$. Our approach involves carefully designing an approximate version of ESCB with the same regret guarantees, showing that this approximate algorithm can be implemented in time ${\cal O}(T {\bf poly}(d))$ by repeatedly maximizing a linear function over ${\cal X}$ subject to a linear budget constraint, and showing how to solve this maximization problems efficiently.

View on arXiv PDF Code

Similar