LGMar 19, 2025

Robust Support Vector Machines for Imbalanced and Noisy Data via Benders Decomposition

arXiv:2503.14873v13 citationsh-index: 7Has Code
Originality Incremental advance
AI Analysis

This work addresses classification challenges for imbalanced and noisy data, offering incremental improvements over existing SVM methods.

The study tackled the problem of class imbalance and noise in Support Vector Machines by introducing a novel formulation that minimizes the frequency of constraint violations, resulting in improved F1-scores for minority classes and enhanced classification accuracy in noisy datasets, with statistical significance (p < 0.05).

This study introduces a novel formulation to enhance Support Vector Machines (SVMs) in handling class imbalance and noise. Unlike the conventional Soft Margin SVM, which penalizes the magnitude of constraint violations, the proposed model quantifies the number of violations and aims to minimize their frequency. To achieve this, a binary variable is incorporated into the objective function of the primal SVM formulation, replacing the traditional slack variable. Furthermore, each misclassified sample is assigned a priority and an associated constraint. The resulting formulation is a mixed-integer programming model, efficiently solved using Benders decomposition. The proposed model's performance was benchmarked against existing models, including Soft Margin SVM, weighted SVM, and NuSVC. Two primary hypotheses were examined: 1) The proposed model improves the F1-score for the minority class in imbalanced classification tasks. 2) The proposed model enhances classification accuracy in noisy datasets. These hypotheses were evaluated using a Wilcoxon test across multiple publicly available datasets from the OpenML repository. The results supported both hypotheses (\( p < 0.05 \)). In addition, the proposed model exhibited several interesting properties, such as improved robustness to noise, a decision boundary shift favoring the minority class, a reduced number of support vectors, and decreased prediction time. The open-source Python implementation of the proposed SVM model is available.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes