DS LGAug 16, 2024

Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms

arXiv:2408.08494v13.32 citationsh-index: 7

Originality Highly original

AI Analysis

This work addresses efficient approximation of residuals in low-rank and sparse approximations, which is incremental but offers improved bounds and empirical advantages for applications like data streaming and heavy hitter detection.

The paper tackles the problem of residual error estimation for matrix and vector norms using linear sketches, providing tight bounds: for matrices, it improves the sketch size to Θ(k²/ε⁴) with sparse matrices for fast updates, and for vectors, it gives an upper bound of O(k^{2/p}n^{1-2/p} poly(log n)) for ℓ_p-norms with p>2, along with matching lower bounds.

We study the problem of residual error estimation for matrix and vector norms using a linear sketch. Such estimates can be used, for example, to quickly assess how useful a more expensive low-rank approximation computation will be. The matrix case concerns the Frobenius norm and the task is to approximate the $k$-residual $\|A - A_k\|_F$ of the input matrix $A$ within a $(1+ε)$-factor, where $A_k$ is the optimal rank-$k$ approximation. We provide a tight bound of $Θ(k^2/ε^4)$ on the size of bilinear sketches, which have the form of a matrix product $SAT$. This improves the previous $O(k^2/ε^6)$ upper bound in (Andoni et al. SODA 2013) and gives the first non-trivial lower bound, to the best of our knowledge. In our algorithm, our sketching matrices $S$ and $T$ can both be sparse matrices, allowing for a very fast update time. We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work. For the vector case, we consider the $\ell_p$-norm for $p>2$, where the task is to approximate the $k$-residual $\|x - x_k\|_p$ up to a constant factor, where $x_k$ is the optimal $k$-sparse approximation to $x$. Such vector norms are frequently studied in the data stream literature and are useful for finding frequent items or so-called heavy hitters. We establish an upper bound of $O(k^{2/p}n^{1-2/p}\operatorname{poly}(\log n))$ for constant $ε$ on the dimension of a linear sketch for this problem. Our algorithm can be extended to the $\ell_p$ sparse recovery problem with the same sketching dimension, which seems to be the first such bound for $p > 2$. We also show an $Ω(k^{2/p}n^{1-2/p})$ lower bound for the sparse recovery problem, which is tight up to a $\mathrm{poly}(\log n)$ factor.

View on arXiv PDF

Similar