NALGJul 3, 2024

When big data actually are low-rank, or entrywise approximation of certain function-generated matrices

arXiv:2407.03250v45 citationsh-index: 4
AI Analysis

This clarifies theoretical limits for low-rank methods in big data and neural networks, but is incremental as it refines existing claims rather than introducing new paradigms.

The paper addresses misconceptions about low-rank approximation of matrices from smooth functions, showing that for specific classes like inner product or distance functions, entrywise error ε can be achieved with rank O(log(n) ε^{-2} log(ε^{-1})) independent of dimension m, and extends this to tensor approximations.

The article concerns low-rank approximation of matrices generated by sampling a smooth function of two $m$-dimensional variables. We identify several misconceptions surrounding a claim that, for a specific class of analytic functions, such $n \times n$ matrices admit accurate entrywise approximation of rank that is independent of $m$ and grows as $\log(n)$ -- colloquially known as ''big-data matrices are approximately low-rank''. We provide a theoretical explanation of the numerical results presented in support of this claim, describing three narrower classes of functions for which function-generated matrices can be approximated within an entrywise error of order $\varepsilon$ with rank $\mathcal{O}(\log(n) \varepsilon^{-2} \log(\varepsilon^{-1}))$ that is independent of the dimension $m$: (i) functions of the inner product of the two variables, (ii) functions of the Euclidean distance between the variables, and (iii) shift-invariant positive-definite kernels. We extend our argument to tensor-train approximation of tensors generated with functions of the ''higher-order inner product'' of their multiple variables. We discuss our results in the context of low-rank approximation of (a) growing datasets and (b) attention in transformer neural networks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes