Online Improper Learning with an Approximation Oracle
This work addresses computational efficiency challenges in online learning for researchers and practitioners, though it appears incremental as it builds on prior methods for oracle-based optimization.
The paper tackles the problem of reducing online learning to approximate optimization of offline problems, presenting algorithms that achieve optimal regret with poly-logarithmic oracle calls per iteration in full information settings and significantly improve oracle complexity in bandit settings while maintaining regret performance.
We revisit the question of reducing online learning to approximate optimization of the offline problem. In this setting, we give two algorithms with near-optimal performance in the full information setting: they guarantee optimal regret and require only poly-logarithmically many calls to the approximation oracle per iteration. Furthermore, these algorithms apply to the more general improper learning problems. In the bandit setting, our algorithm also significantly improves the best previously known oracle complexity while maintaining the same regret.