Training Support Vector Machines using Coresets
This work addresses the problem of scalable SVM training for machine learning practitioners by providing an incremental improvement in coreset efficiency.
The paper tackles the computational inefficiency of training Support Vector Machines (SVMs) by introducing a coreset construction algorithm that uses importance sampling to create a weighted subset of data points, achieving computational speedups with low approximation error compared to state-of-the-art methods.
We present a novel coreset construction algorithm for solving classification tasks using Support Vector Machines (SVMs) in a computationally efficient manner. A coreset is a weighted subset of the original data points that provably approximates the original set. We show that coresets of size polylogarithmic in $n$ and polynomial in $d$ exist for a set of $n$ input points with $d$ features and present an $(ε,δ)$-FPRAS for constructing coresets for scalable SVM training. Our method leverages the insight that data points are often redundant and uses an importance sampling scheme based on the sensitivity of each data point to construct coresets efficiently. We evaluate the performance of our algorithm in accelerating SVM training against real-world data sets and compare our algorithm to state-of-the-art coreset approaches. Our empirical results show that our approach outperforms a state-of-the-art coreset approach and uniform sampling in enabling computational speedups while achieving low approximation error.