Jan van den Brand

12.7DSMar 29

An Optimal Algorithm for Stochastic Vertex Cover

Jan van den Brand, Inge Li Gørtz, Chirag Pabbaraju et al.

The goal in the stochastic vertex cover problem is to obtain an approximately minimum vertex cover for a graph $G^\star$ that is realized by sampling each edge independently with some probability $p\in (0, 1]$ in a base graph $G = (V, E)$. The algorithm is given the base graph $G$ and the probability $p$ as inputs, but its only access to the realized graph $G^\star$ is through queries on individual edges in $G$ that reveal the existence (or not) of the queried edge in $G^\star$. In this paper, we resolve the central open question for this problem: to find a $(1+\varepsilon)$-approximate vertex cover using only $O_\varepsilon(n/p)$ edge queries. Prior to our work, there were two incomparable state-of-the-art results for this problem: a $(3/2+\varepsilon)$-approximation using $O_\varepsilon(n/p)$ queries (Derakhshan, Durvasula, and Haghtalab, 2023) and a $(1+\varepsilon)$-approximation using $O_\varepsilon((n/p)\cdot \mathrm{RS}(n))$ queries (Derakhshan, Saneian, and Xun, 2025), where $\mathrm{RS}(n)$ is known to be at least $2^{Î©\left(\frac{\log n}{\log \log n}\right)}$ and could be as large as $\frac{n}{2^{Î(\log^* n)}}$. Our improved upper bound of $O_{\varepsilon}(n/p)$ matches the known lower bound of $Î©(n/p)$ for any constant-factor approximation algorithm for this problem (Behnezhad, Blum, and Derakhshan, 2022). A key tool in our result is a new concentration bound for the size of minimum vertex cover on random graphs, which might be of independent interest.

23.2LGJun 20, 2020

Training (Overparametrized) Neural Networks in Near-Linear Time

Jan van den Brand, Binghui Peng, Zhao Song et al.

The slow convergence rate and pathological curvature issues of first-order gradient methods for training deep neural networks, initiated an ongoing effort for developing faster $\mathit{second}$-$\mathit{order}$ optimization algorithms beyond SGD, without compromising the generalization error. Despite their remarkable convergence rate ($\mathit{independent}$ of the training batch size $n$), second-order algorithms incur a daunting slowdown in the $\mathit{cost}$ $\mathit{per}$ $\mathit{iteration}$ (inverting the Hessian matrix of the loss function), which renders them impractical. Very recently, this computational overhead was mitigated by the works of [ZMG19,CGH+19}, yielding an $O(mn^2)$-time second-order algorithm for training two-layer overparametrized neural networks of polynomial width $m$. We show how to speed up the algorithm of [CGH+19], achieving an $\tilde{O}(mn)$-time backpropagation algorithm for training (mildly overparametrized) ReLU networks, which is near-linear in the dimension ($mn$) of the full gradient (Jacobian) matrix. The centerpiece of our algorithm is to reformulate the Gauss-Newton iteration as an $\ell_2$-regression problem, and then use a Fast-JL type dimension reduction to $\mathit{precondition}$ the underlying Gram matrix in time independent of $M$, allowing to find a sufficiently good approximate solution via $\mathit{first}$-$\mathit{order}$ conjugate gradient. Our result provides a proof-of-concept that advanced machinery from randomized linear algebra -- which led to recent breakthroughs in $\mathit{convex}$ $\mathit{optimization}$ (ERM, LPs, Regression) -- can be carried over to the realm of deep learning as well.

Jan van den Brand

2 Papers