Frank Ban

3papers

41citations

Novelty65%

AI Score27

Ranked #162,189 of 201,326 authors (top 81%)#504 in DS (top 90%)

3 Papers

DSNov 16, 2019

Regularized Weighted Low Rank Approximation

Frank Ban, David Woodruff, Qiuyi Zhang

The classical low rank approximation problem is to find a rank $k$ matrix $UV$ (where $U$ has $k$ columns and $V$ has $k$ rows) that minimizes the Frobenius norm of $A - UV$. Although this problem can be solved efficiently, we study an NP-hard variant of this problem that involves weights and regularization. A previous paper of [Razenshteyn et al. '16] derived a polynomial time algorithm for weighted low rank approximation with constant rank. We derive provably sharper guarantees for the regularized version by obtaining parameterized complexity bounds in terms of the statistical dimension rather than the rank, allowing for a rank-independent runtime that can be significantly faster. Our improvement comes from applying sharper matrix concentration bounds, using a novel conditioning technique, and proving structural theorems for regularized low rank problems.

DSJul 12, 2019

Efficient average-case population recovery in the presence of insertions and deletions

Frank Ban, Xi Chen, Rocco A. Servedio et al.

Several recent works have considered the \emph{trace reconstruction problem}, in which an unknown source string $x\in\{0,1\}^n$ is transmitted through a probabilistic channel which may randomly delete coordinates or insert random bits, resulting in a \emph{trace} of $x$. The goal is to reconstruct the original string~$x$ from independent traces of $x$. While the best algorithms known for worst-case strings use $\exp(O(n^{1/3}))$ traces \cite{DOS17,NazarovPeres17}, highly efficient algorithms are known \cite{PZ17,HPP18} for the \emph{average-case} version, in which $x$ is uniformly random. We consider a generalization of this average-case trace reconstruction problem, which we call \emph{average-case population recovery in the presence of insertions and deletions}. In this problem, there is an unknown distribution $\cal{D}$ over $s$ unknown source strings $x^1,\dots,x^s \in \{0,1\}^n$, and each sample is independently generated by drawing some $x^i$ from $\cal{D}$ and returning an independent trace of $x^i$. Building on \cite{PZ17} and \cite{HPP18}, we give an efficient algorithm for this problem. For any support size $s \leq \smash{\exp(Θ(n^{1/3}))}$, for a $1-o(1)$ fraction of all $s$-element support sets $\{x^1,\dots,x^s\} \subset \{0,1\}^n$, for every distribution $\cal{D}$ supported on $\{x^1,\dots,x^s\}$, our algorithm efficiently recovers ${\cal D}$ up to total variation distance $ε$ with high probability, given access to independent traces of independent draws from $\cal{D}$. The algorithm runs in time poly$(n,s,1/ε)$ and its sample complexity is poly$(s,1/ε,\exp(\log^{1/3}n)).$ This polynomial dependence on the support size $s$ is in sharp contrast with the \emph{worst-case} version (when $x^1,\dots,x^s$ may be any strings in $\{0,1\}^n$), in which the sample complexity of the most efficient known algorithm \cite{BCFSS19} is doubly exponential in $s$.

DSJul 16, 2018

A PTAS for $\ell_p$-Low Rank Approximation

Frank Ban, Vijay Bhattiprolu, Karl Bringmann et al.

A number of recent works have studied algorithms for entrywise $\ell_p$-low rank approximation, namely, algorithms which given an $n \times d$ matrix $A$ (with $n \geq d$), output a rank-$k$ matrix $B$ minimizing $\|A-B\|_p^p=\sum_{i,j}|A_{i,j}-B_{i,j}|^p$ when $p > 0$; and $\|A-B\|_0=\sum_{i,j}[A_{i,j}\neq B_{i,j}]$ for $p=0$. On the algorithmic side, for $p \in (0,2)$, we give the first $(1+ε)$-approximation algorithm running in time $n^{\text{poly}(k/ε)}$. Further, for $p = 0$, we give the first almost-linear time approximation scheme for what we call the Generalized Binary $\ell_0$-Rank-$k$ problem. Our algorithm computes $(1+ε)$-approximation in time $(1/ε)^{2^{O(k)}/ε^{2}} \cdot nd^{1+o(1)}$. On the hardness of approximation side, for $p \in (1,2)$, assuming the Small Set Expansion Hypothesis and the Exponential Time Hypothesis (ETH), we show that there exists $δ:= δ(α) > 0$ such that the entrywise $\ell_p$-Rank-$k$ problem has no $α$-approximation algorithm running in time $2^{k^δ}$.