Online Non-Additive Path Learning under Full and Partial Information
This work addresses a central problem in online learning with applications like ensemble structured prediction, but it appears incremental as it builds on existing methods for additive gains.
The paper tackles online path learning with non-additive gains by presenting new algorithms for full information, semi-bandit, and full bandit settings, achieving favorable regret guarantees, and applies these to ensemble structured prediction.
We study the problem of online path learning with non-additive gains, which is a central problem appearing in several applications, including ensemble structured prediction. We present new online algorithms for path learning with non-additive count-based gains for the three settings of full information, semi-bandit and full bandit with very favorable regret guarantees. A key component of our algorithms is the definition and computation of an intermediate context-dependent automaton that enables us to use existing algorithms designed for additive gains. We further apply our methods to the important application of ensemble structured prediction. Finally, beyond count-based gains, we give an efficient implementation of the EXP3 algorithm for the full bandit setting with an arbitrary (non-additive) gain.