Minimizing The Misclassification Error Rate Using a Surrogate Convex Loss
This work addresses a foundational theoretical problem in machine learning, providing insights into loss function selection for binary classification, but it is incremental as it builds on existing convex surrogate theory.
The paper tackles the problem of binary classification with linear predictors by analyzing how minimizing convex surrogate loss functions relates to minimizing the misclassification error rate, showing that the hinge loss provides the best possible bound among all convex losses for this error in terms of margin error rate.
We carefully study how well minimizing convex surrogate loss functions, corresponds to minimizing the misclassification error rate for the problem of binary classification with linear predictors. In particular, we show that amongst all convex surrogate losses, the hinge loss gives essentially the best possible bound, of all convex loss functions, for the misclassification error rate of the resulting linear predictor in terms of the best possible margin error rate. We also provide lower bounds for specific convex surrogates that show how different commonly used losses qualitatively differ from each other.