NA MLMay 4, 2015

An Explicit Sampling Dependent Spectral Error Bound for Column Subset Selection

Tianbao Yang, Lijun Zhang, Rong Jin, Shenghuo Zhu

arXiv:1505.00526v14.321 citations

Originality Incremental advance

AI Analysis

This work addresses column subset selection for low-rank matrix approximation, offering incremental improvements in error bounds and sampling strategies for applications in data analysis and machine learning.

The paper tackles the problem of column subset selection by analyzing a randomized algorithm's spectral norm reconstruction error, establishing a new bound that explicitly depends on sampling probabilities. It shows that a distribution proportional to the square root of leverage scores outperforms uniform and leverage-based sampling in certain cases, with numerical simulations demonstrating improved performance for low-rank matrix and least squares approximation compared to state-of-the-art methods.

In this paper, we consider the problem of column subset selection. We present a novel analysis of the spectral norm reconstruction for a simple randomized algorithm and establish a new bound that depends explicitly on the sampling probabilities. The sampling dependent error bound (i) allows us to better understand the tradeoff in the reconstruction error due to sampling probabilities, (ii) exhibits more insights than existing error bounds that exploit specific probability distributions, and (iii) implies better sampling distributions. In particular, we show that a sampling distribution with probabilities proportional to the square root of the statistical leverage scores is always better than uniform sampling and is better than leverage-based sampling when the statistical leverage scores are very nonuniform. And by solving a constrained optimization problem related to the error bound with an efficient bisection search we are able to achieve better performance than using either the leverage-based distribution or that proportional to the square root of the statistical leverage scores. Numerical simulations demonstrate the benefits of the new sampling distributions for low-rank matrix approximation and least square approximation compared to state-of-the art algorithms.

View on arXiv PDF

Similar