Tight Bounds for Bandit Combinatorial Optimization
This resolves fundamental open problems in sequential decision-making under uncertainty for researchers in machine learning and optimization, though it is incremental in refining theoretical bounds.
The paper tackles the problem of determining optimal regret rates in bandit combinatorial optimization, proving a tight bound of $\widetilde{\Theta}(k^{3/2}\sqrt{dT})$ and disproving a previous conjecture of $\widetilde{\Theta}(k\sqrt{dT})$, with implications for instances like the bandit shortest path problem.
We revisit the study of optimal regret rates in bandit combinatorial optimization---a fundamental framework for sequential decision making under uncertainty that abstracts numerous combinatorial prediction problems. We prove that the attainable regret in this setting grows as $\widetildeΘ(k^{3/2}\sqrt{dT})$ where $d$ is the dimension of the problem and $k$ is a bound over the maximal instantaneous loss, disproving a conjecture of Audibert, Bubeck, and Lugosi (2013) who argued that the optimal rate should be of the form $\widetildeΘ(k\sqrt{dT})$. Our bounds apply to several important instances of the framework, and in particular, imply a tight bound for the well-studied bandit shortest path problem. By that, we also resolve an open problem posed by Cesa-Bianchi and Lugosi (2012).