Online Structured Prediction with Fenchel--Young Losses and Improved Surrogate Regret for Online Multiclass Classification with Logistic Loss
This work addresses the problem of online structured prediction for machine learning researchers, providing tighter regret bounds and extending a key framework to a broader class of losses, though it is incremental in nature.
This paper extends the exploit-the-surrogate-gap framework from online multiclass classification to online structured prediction using Fenchel-Young losses, achieving finite surrogate regret bounds. For online multiclass classification with logistic loss, it improves the surrogate regret bound from O(d||U||_F^2) to O(||U||_F^2), where d is the number of classes and U is the best offline linear estimator.
This paper studies online structured prediction with full-information feedback. For online multiclass classification, Van der Hoeven (2020) established \emph{finite} surrogate regret bounds, which are independent of the time horizon, by introducing an elegant \emph{exploit-the-surrogate-gap} framework. However, this framework has been limited to multiclass classification primarily because it relies on a classification-specific procedure for converting estimated scores to outputs. We extend the exploit-the-surrogate-gap framework to online structured prediction with \emph{Fenchel--Young losses}, a large family of surrogate losses that includes the logistic loss for multiclass classification as a special case, obtaining finite surrogate regret bounds in various structured prediction problems. To this end, we propose and analyze \emph{randomized decoding}, which converts estimated scores to general structured outputs. Moreover, by applying our decoding to online multiclass classification with the logistic loss, we obtain a surrogate regret bound of $O(\| \mathbf{U} \|_\mathrm{F}^2)$, where $\mathbf{U}$ is the best offline linear estimator and $\| \cdot \|_\mathrm{F}$ denotes the Frobenius norm. This bound is tight up to logarithmic factors and improves the previous bound of $O(d\| \mathbf{U} \|_\mathrm{F}^2)$ due to Van der Hoeven (2020) by a factor of $d$, the number of classes.