Scaling Up ROC-Optimizing Support Vector Machines
This work addresses a computational bottleneck for researchers and practitioners using ROC-SVM in imbalanced classification, making it more practical for large datasets.
The paper tackled the high computational cost of ROC-SVM, which scales quadratically with data size, by developing a scalable variant using incomplete U-statistics and low-rank kernel approximations, achieving comparable AUC performance with drastically reduced training time.
The ROC-SVM, originally proposed by Rakotomamonjy, directly maximizes the area under the ROC curve (AUC) and has become an attractive alternative of the conventional binary classification under the presence of class imbalance. However, its practical use is limited by high computational cost, as training involves evaluating all $O(n^2)$. To overcome this limitation, we develop a scalable variant of the ROC-SVM that leverages incomplete U-statistics, thereby substantially reducing computational complexity. We further extend the framework to nonlinear classification through a low-rank kernel approximation, enabling efficient training in reproducing kernel Hilbert spaces. Theoretical analysis establishes an error bound that justifies the proposed approximation, and empirical results on both synthetic and real datasets demonstrate that the proposed method achieves comparable AUC performance to the original ROC-SVM with drastically reduced training time.