Logarithmic Regret from Sublinear Hints
This work addresses the efficiency of hint usage in online learning algorithms, offering a more practical approach for scenarios with limited hint availability, though it is incremental as it builds on prior results about hints.
The paper tackles the problem of reducing regret in online linear optimization by showing that logarithmic regret can be achieved with only O(√T) hints, instead of requiring a hint at every step, and proves that fewer hints cannot improve beyond Ω(√T) regret.
We consider the online linear optimization problem, where at every step the algorithm plays a point $x_t$ in the unit ball, and suffers loss $\langle c_t, x_t\rangle$ for some cost vector $c_t$ that is then revealed to the algorithm. Recent work showed that if an algorithm receives a hint $h_t$ that has non-trivial correlation with $c_t$ before it plays $x_t$, then it can achieve a regret guarantee of $O(\log T)$, improving on the bound of $Θ(\sqrt{T})$ in the standard setting. In this work, we study the question of whether an algorithm really requires a hint at every time step. Somewhat surprisingly, we show that an algorithm can obtain $O(\log T)$ regret with just $O(\sqrt{T})$ hints under a natural query model; in contrast, we also show that $o(\sqrt{T})$ hints cannot guarantee better than $Ω(\sqrt{T})$ regret. We give two applications of our result, to the well-studied setting of optimistic regret bounds and to the problem of online learning with abstention.