Provable Deterministic Leverage Score Sampling
This provides a deterministic alternative to randomized matrix approximation methods, which is incremental but useful for applications requiring reproducibility and efficiency in data analysis.
The paper tackles the problem of approximating matrices by deterministically selecting columns with the largest leverage scores, showing that this method can be as accurate as randomized sampling under a power-law decay assumption, with empirical evidence that it often matches or outperforms state-of-the-art techniques.
We explain theoretically a curious empirical phenomenon: "Approximating a matrix by deterministically selecting a subset of its columns with the corresponding largest leverage scores results in a good low-rank matrix surrogate". To obtain provable guarantees, previous work requires randomized sampling of the columns with probabilities proportional to their leverage scores. In this work, we provide a novel theoretical analysis of deterministic leverage score sampling. We show that such deterministic sampling can be provably as accurate as its randomized counterparts, if the leverage scores follow a moderately steep power-law decay. We support this power-law assumption by providing empirical evidence that such decay laws are abundant in real-world data sets. We then demonstrate empirically the performance of deterministic leverage score sampling, which many times matches or outperforms the state-of-the-art techniques.