A Log-linear Gradient Descent Algorithm for Unbalanced Binary Classification using the All Pairs Squared Hinge Loss
This work addresses a computational bottleneck for researchers and practitioners dealing with imbalanced binary classification, offering incremental improvements in efficiency and performance.
The paper tackles the computational inefficiency of optimizing AUC for imbalanced binary classification by proposing a new functional representation of the square and squared hinge losses, enabling gradient computation in linear or log-linear time. This allows the use of larger batch sizes and achieves higher test AUC values on imbalanced datasets compared to previous methods.
Receiver Operating Characteristic (ROC) curves are plots of true positive rate versus false positive rate which are used to evaluate binary classification algorithms. Because the Area Under the Curve (AUC) is a constant function of the predicted values, learning algorithms instead optimize convex relaxations which involve a sum over all pairs of labeled positive and negative examples. Naive learning algorithms compute the gradient in quadratic time, which is too slow for learning using large batch sizes. We propose a new functional representation of the square loss and squared hinge loss, which results in algorithms that compute the gradient in either linear or log-linear time, and makes it possible to use gradient descent learning with large batch sizes. In our empirical study of supervised binary classification problems, we show that our new algorithm can achieve higher test AUC values on imbalanced data sets than previous algorithms, and make use of larger batch sizes than were previously feasible.