DSCCLGJul 18, 2018

Approximation Schemes for Low-Rank Binary Matrix Approximation Problems

arXiv:1807.07156v130 citations
Originality Highly original
AI Analysis

This work addresses fundamental problems in computational theory and data analysis, offering substantial improvements in running times and approximation factors over previous methods, though it is incremental in advancing algorithmic techniques.

The paper tackles the problem of clustering binary vectors and low-rank approximation of binary matrices, providing a randomized linear time approximation scheme that yields a (1+ε)-approximate solution for problems like Low GF(2)-Rank Approximation, with running time f(r,ε)·n·m and probability at least (1-1/e).

We provide a randomized linear time approximation scheme for a generic problem about clustering of binary vectors subject to additional constrains. The new constrained clustering problem encompasses a number of problems and by solving it, we obtain the first linear time-approximation schemes for a number of well-studied fundamental problems concerning clustering of binary vectors and low-rank approximation of binary matrices. Among the problems solvable by our approach are \textsc{Low GF(2)-Rank Approximation}, \textsc{Low Boolean-Rank Approximation}, and various versions of \textsc{Binary Clustering}. For example, for \textsc{Low GF(2)-Rank Approximation} problem, where for an $m\times n$ binary matrix $A$ and integer $r>0$, we seek for a binary matrix $B$ of $GF_2$ rank at most $r$ such that $\ell_0$ norm of matrix $A-B$ is minimum, our algorithm, for any $ε>0$ in time $ f(r,ε)\cdot n\cdot m$, where $f$ is some computable function, outputs a $(1+ε)$-approximate solution with probability at least $(1-\frac{1}{e})$. Our approximation algorithms substantially improve the running times and approximation factors of previous works. We also give (deterministic) PTASes for these problems running in time $n^{f(r)\frac{1}{ε^2}\log \frac{1}ε}$, where $f$ is some function depending on the problem. Our algorithm for the constrained clustering problem is based on a novel sampling lemma, which is interesting in its own.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes