Input Sparsity Time Low-Rank Approximation via Ridge Leverage Score Sampling
This work provides a novel algorithmic approach for efficient low-rank approximation, which is incremental but offers practical benefits for handling large-scale sparse and streaming data in machine learning and data analysis.
The paper tackles the problem of computing near-optimal low-rank approximations of matrices in input-sparsity time, achieving this in O(nnz(A)) time with a recursive sampling scheme that matches prior guarantees while offering advantages like faster performance on sparse data and applicability in streaming settings.
We present a new algorithm for finding a near optimal low-rank approximation of a matrix $A$ in $O(nnz(A))$ time. Our method is based on a recursive sampling scheme for computing a representative subset of $A$'s columns, which is then used to find a low-rank approximation. This approach differs substantially from prior $O(nnz(A))$ time algorithms, which are all based on fast Johnson-Lindenstrauss random projections. It matches the guarantees of these methods while offering a number of advantages. Not only are sampling algorithms faster for sparse and structured data, but they can also be applied in settings where random projections cannot. For example, we give new single-pass streaming algorithms for the column subset selection and projection-cost preserving sample problems. Our method has also been used to give the fastest algorithms for provably approximating kernel matrices [MM16].