DSCCLGNov 4, 2018

Towards a Zero-One Law for Column Subset Selection

arXiv:1811.01442v235 citations
Originality Highly original
AI Analysis

This provides a foundational theoretical framework for generalized low rank approximation, addressing a central question in machine learning and data analysis.

The paper tackles the problem of approximating low rank matrices for arbitrary error functions in column subset selection, showing that efficient algorithms exist for every approximately monotone function satisfying an approximate triangle inequality, and proving these conditions are necessary, with results including handling non-scale-invariant functions like Huber loss.

There are a number of approximation algorithms for NP-hard versions of low rank approximation, such as finding a rank-$k$ matrix $B$ minimizing the sum of absolute values of differences to a given $n$-by-$n$ matrix $A$, $\min_{\textrm{rank-}k~B}\|A-B\|_1$, or more generally finding a rank-$k$ matrix $B$ which minimizes the sum of $p$-th powers of absolute values of differences, $\min_{\textrm{rank-}k~B}\|A-B\|_p^p$. Many of these algorithms are linear time columns subset selection algorithms, returning a subset of $\mathrm{poly}(k \log n)$ columns whose cost is no more than a $\mathrm{poly}(k)$ factor larger than the cost of the best rank-$k$ matrix. The above error measures are special cases of the following general entrywise low rank approximation problem: given an arbitrary function $g:\mathbb{R} \rightarrow \mathbb{R}_{\geq 0}$, find a rank-$k$ matrix $B$ which minimizes $\|A-B\|_g = \sum_{i,j}g(A_{i,j}-B_{i,j})$. A natural question is which functions $g$ admit efficient approximation algorithms? Indeed, this is a central question of recent work studying generalized low rank models. In this work we give approximation algorithms for $\textit{every}$ function $g$ which is approximately monotone and satisfies an approximate triangle inequality, and we show both of these conditions are necessary. Further, our algorithm is efficient if the function $g$ admits an efficient approximate regression algorithm. Our approximation algorithms handle functions which are not even scale-invariant, such as the Huber loss function, which we show have very different structural properties than $\ell_p$-norms, e.g., one can show the lack of scale-invariance causes any column subset selection algorithm to provably require a $\sqrt{\log n}$ factor larger number of columns than $\ell_p$-norms; nevertheless we design the first efficient column subset selection algorithms for such error measures.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes