Linear-Core Surrogates: Smooth Loss Functions with Linear Rates for Classification and Structured Prediction
For practitioners in classification and structured prediction, this resolves the trade-off between smoothness and linear consistency, offering faster training and better robustness to label noise.
The paper introduces Linear-Core (LC) Surrogates, a family of convex loss functions that achieve both smoothness and linear consistency bounds, enabling fast optimization and statistical efficiency. In structured prediction, this smoothness allows unbiased stochastic gradient estimation without quadratic complexity, yielding a 23× speedup over Structured SVMs on sequence tagging and 2.6% improvement over Cross-Entropy on noisy CIFAR-10.
The choice of loss function in classification involves a fundamental trade-off: smooth losses (like Cross-Entropy) enable fast optimization rates but yield slow square-root consistency bounds, while piecewise-linear losses (like Hinge) offer fast linear consistency rates but suffer from non-differentiability. We propose Linear-Core (LC) Surrogates, a new family of convex loss functions that resolve this tension by stitching a linear core to a smooth tail. We prove that these surrogates are differentiable everywhere while retaining strict linear $H$-consistency bounds, effectively combining the optimization benefits of smoothness with the statistical efficiency of margin-based losses. In the structured prediction setting, we show that this smoothness unlocks a massive computational and energy advantage: it allows for an unbiased stochastic gradient estimator that bypasses the quadratic complexity $O(|\mathscr{Y}|^2)$ of exact inference (e.g., Viterbi). Empirically, our method achieves a 23$\times$ speedup over Structured SVMs on large-vocabulary sequence tagging tasks and demonstrates superior robustness to instance-dependent label noise, outperforming Cross-Entropy by 2.6% on corrupted CIFAR-10.