Linear Bandits on Uniformly Convex Sets
This addresses an open question in bandit optimization, offering improved theoretical guarantees for specific convex sets, though it is incremental in extending known results to broader classes.
The paper tackles the problem of improving pseudo-regret bounds for linear bandit algorithms on convex action sets, showing that for strongly convex sets beyond ℓ_p balls, they achieve Õ(√(nT)) bounds, and for uniformly convex sets, they obtain bounds with dimension dependency better than O(√n) but with varying asymptotic rates in T.
Linear bandit algorithms yield $\tilde{\mathcal{O}}(n\sqrt{T})$ pseudo-regret bounds on compact convex action sets $\mathcal{K}\subset\mathbb{R}^n$ and two types of structural assumptions lead to better pseudo-regret bounds. When $\mathcal{K}$ is the simplex or an $\ell_p$ ball with $p\in]1,2]$, there exist bandits algorithms with $\tilde{\mathcal{O}}(\sqrt{nT})$ pseudo-regret bounds. Here, we derive bandit algorithms for some strongly convex sets beyond $\ell_p$ balls that enjoy pseudo-regret bounds of $\tilde{\mathcal{O}}(\sqrt{nT})$, which answers an open question from [BCB12, §5.5.]. Interestingly, when the action set is uniformly convex but not necessarily strongly convex, we obtain pseudo-regret bounds with a dimension dependency smaller than $\mathcal{O}(\sqrt{n})$. However, this comes at the expense of asymptotic rates in $T$ varying between $\tilde{\mathcal{O}}(\sqrt{T})$ and $\tilde{\mathcal{O}}(T)$.