Weston-Watkins Hinge Loss and Ordered Partitions
This work addresses a theoretical gap in multiclass SVM formulations, providing justification for the empirical success of the Weston-Watkins variant, particularly in noisy label settings.
The paper tackles the calibration issue of the Weston-Watkins hinge loss in multiclass SVMs by introducing a novel discrete loss function called the ordered partition loss, proving that the WW-hinge loss is calibrated with respect to it, and using this theory to justify empirical findings that WW-SVM performs well under massive label noise.
Multiclass extensions of the support vector machine (SVM) have been formulated in a variety of ways. A recent empirical comparison of nine such formulations [Doǧan et al. 2016] recommends the variant proposed by Weston and Watkins (WW), despite the fact that the WW-hinge loss is not calibrated with respect to the 0-1 loss. In this work we introduce a novel discrete loss function for multiclass classification, the ordered partition loss, and prove that the WW-hinge loss is calibrated with respect to this loss. We also argue that the ordered partition loss is maximally informative among discrete losses satisfying this property. Finally, we apply our theory to justify the empirical observation made by Doǧan et al. that the WW-SVM can work well even under massive label noise, a challenging setting for multiclass SVMs.