Generic Coreset for Scalable Learning of Monotonic Kernels: Logistic Regression, Sigmoid and more
This work addresses scalability issues in machine learning for researchers and practitioners using monotonic kernels like logistic regression, though it is incremental as it builds on existing coreset theory.
The paper tackles the problem of constructing small coresets for scalable learning with monotonic kernel functions, proving that under natural assumptions, small coresets exist for any input and providing an algorithm to compute them efficiently, with experimental results showing coresets are much smaller in practice than theoretical predictions.
Coreset (or core-set) is a small weighted \emph{subset} $Q$ of an input set $P$ with respect to a given \emph{monotonic} function $f:\mathbb{R}\to\mathbb{R}$ that \emph{provably} approximates its fitting loss $\sum_{p\in P}f(p\cdot x)$ to \emph{any} given $x\in\mathbb{R}^d$. Using $Q$ we can obtain approximation of $x^*$ that minimizes this loss, by running \emph{existing} optimization algorithms on $Q$. In this work we provide: (i) A lower bound which proves that there are sets with no coresets smaller than $n=|P|$ for general monotonic loss functions. (ii) A proof that, under a natural assumption that holds e.g. for logistic regression and the sigmoid activation functions, a small coreset exists for \emph{any} input $P$. (iii) A generic coreset construction algorithm that computes such a small coreset $Q$ in $O(nd+n\log n)$ time, and (iv) Experimental results which demonstrate that our coresets are effective and are much smaller in practice than predicted in theory.