LGFeb 13, 2024

Online Structured Prediction with Fenchel--Young Losses and Improved Surrogate Regret for Online Multiclass Classification with Logistic Loss

arXiv:2402.08180v37 citationsh-index: 10COLT
Originality Incremental advance
AI Analysis

This work addresses the problem of online structured prediction for machine learning researchers, providing tighter regret bounds and extending a key framework to a broader class of losses, though it is incremental in nature.

This paper extends the exploit-the-surrogate-gap framework from online multiclass classification to online structured prediction using Fenchel-Young losses, achieving finite surrogate regret bounds. For online multiclass classification with logistic loss, it improves the surrogate regret bound from O(d||U||_F^2) to O(||U||_F^2), where d is the number of classes and U is the best offline linear estimator.

This paper studies online structured prediction with full-information feedback. For online multiclass classification, Van der Hoeven (2020) established \emph{finite} surrogate regret bounds, which are independent of the time horizon, by introducing an elegant \emph{exploit-the-surrogate-gap} framework. However, this framework has been limited to multiclass classification primarily because it relies on a classification-specific procedure for converting estimated scores to outputs. We extend the exploit-the-surrogate-gap framework to online structured prediction with \emph{Fenchel--Young losses}, a large family of surrogate losses that includes the logistic loss for multiclass classification as a special case, obtaining finite surrogate regret bounds in various structured prediction problems. To this end, we propose and analyze \emph{randomized decoding}, which converts estimated scores to general structured outputs. Moreover, by applying our decoding to online multiclass classification with the logistic loss, we obtain a surrogate regret bound of $O(\| \mathbf{U} \|_\mathrm{F}^2)$, where $\mathbf{U}$ is the best offline linear estimator and $\| \cdot \|_\mathrm{F}$ denotes the Frobenius norm. This bound is tight up to logarithmic factors and improves the previous bound of $O(d\| \mathbf{U} \|_\mathrm{F}^2)$ due to Van der Hoeven (2020) by a factor of $d$, the number of classes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes