LGMLJun 11, 2019

Fast and Accurate Least-Mean-Squares Solvers

arXiv:1906.04705v236 citationsHas Code
AI Analysis

This work addresses a fundamental bottleneck in machine learning for practitioners using LMS solvers, offering significant speed-ups with practical applications.

The paper tackles the problem of efficiently computing weighted subsets for least-mean squares solvers, achieving a time complexity of O(nd + d^4 log n) and demonstrating up to 100x performance improvements in existing solvers.

Least-mean squares (LMS) solvers such as Linear / Ridge / Lasso-Regression, SVD and Elastic-Net not only solve fundamental machine learning problems, but are also the building blocks in a variety of other methods, such as decision trees and matrix factorizations. We suggest an algorithm that gets a finite set of $n$ $d$-dimensional real vectors and returns a weighted subset of $d+1$ vectors whose sum is \emph{exactly} the same. The proof in Caratheodory's Theorem (1907) computes such a subset in $O(n^2d^2)$ time and thus not used in practice. Our algorithm computes this subset in $O(nd+d^4\log{n})$ time, using $O(\log n)$ calls to Caratheodory's construction on small but "smart" subsets. This is based on a novel paradigm of fusion between different data summarization techniques, known as sketches and coresets. For large values of $d$, we suggest a faster construction that takes $O(nd)$ time (linear in the input's size) and returns a weighted subset of $O(d)$ sparsified input points. Here, sparsified point means that some of its entries were replaced by zeroes. As an example application, we show how it can be used to boost the performance of existing LMS solvers, such as those in scikit-learn library, up to x100. Generalization for streaming and distributed (big) data is trivial. Extensive experimental results and complete open source code are also provided.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes