Learning The Best Expert Efficiently
This addresses a key efficiency challenge in online learning for scenarios where expert regrets vary widely, offering incremental improvements over existing methods.
The paper tackles the problem of achieving regret that matches the lowest regret among K experts in online learning, which is stronger than standard guarantees. It shows that a lazy online subgradient algorithm achieves minimal regret in easy regimes while maintaining O(√n) worst-case regret, and identifies strategies for minimal regret in some hard regimes.
We consider online learning problems where the aim is to achieve regret which is efficient in the sense that it is the same order as the lowest regret amongst K experts. This is a substantially stronger requirement that achieving $O(\sqrt{n})$ or $O(\log n)$ regret with respect to the best expert and standard algorithms are insufficient, even in easy cases where the regrets of the available actions are very different from one another. We show that a particular lazy form of the online subgradient algorithm can be used to achieve minimal regret in a number of "easy" regimes while retaining an $O(\sqrt{n})$ worst-case regret guarantee. We also show that for certain classes of problem minimal regret strategies exist for some of the remaining "hard" regimes.