DS NA NAOct 14, 2015

Column Selection via Adaptive Sampling

Saurabh Paul, Malik Magdon-Ismail, Petros Drineas

arXiv:1510.0414923 citations

Originality Synthesis-oriented

AI Analysis

For practitioners needing efficient column subset selection in large-scale data analysis, this work provides a practical enhancement to existing algorithms.

The paper proposes an adaptive sampling algorithm that improves any relative-error column selection method, achieving tighter approximation error bounds and outperforming non-adaptive and prior adaptive sampling approaches on synthetic and real-world data.

Selecting a good column (or row) subset of massive data matrices has found many applications in data analysis and machine learning. We propose a new adaptive sampling algorithm that can be used to improve any relative-error column selection algorithm. Our algorithm delivers a tighter theoretical bound on the approximation error which we also demonstrate empirically using two well known relative-error column subset selection algorithms. Our experimental results on synthetic and real-world data show that our algorithm outperforms non-adaptive sampling as well as prior adaptive sampling approaches.

View on arXiv PDF

Similar