NoisyCUR: An algorithm for two-cost budgeted matrix completion
This addresses a practical problem for experimentalists in machine learning and data analysis who face cost limitations in data collection, offering an incremental improvement over existing methods.
The paper tackles matrix completion under a budget constraint with two sampling modalities differing in noise and cost, introducing a regression-based algorithm that outperforms standard methods at low budgets and matches their accuracy with less computation at high budgets.
Matrix completion is a ubiquitous tool in machine learning and data analysis. Most work in this area has focused on the number of observations necessary to obtain an accurate low-rank approximation. In practice, however, the cost of observations is an important limiting factor, and experimentalists may have on hand multiple modes of observation with differing noise-vs-cost trade-offs. This paper considers matrix completion subject to such constraints: a budget is imposed and the experimentalist's goal is to allocate this budget between two sampling modalities in order to recover an accurate low-rank approximation. Specifically, we consider that it is possible to obtain low noise, high cost observations of individual entries or high noise, low cost observations of entire columns. We introduce a regression-based completion algorithm for this setting and experimentally verify the performance of our approach on both synthetic and real data sets. When the budget is low, our algorithm outperforms standard completion algorithms. When the budget is high, our algorithm has comparable error to standard nuclear norm completion algorithms and requires much less computational effort.