Jonathan Shi

DS
6papers
485citations
Novelty63%
AI Score49

6 Papers

84.3PRMay 20
Potential Hessian Ascent: The Sherrington-Kirkpatrick Model

David Jekel, Juspreet Singh Sandhu, Jonathan Shi · harvard

We present the first iterative spectral algorithm to find near-optimal solutions for a random quadratic objective over the discrete hypercube, resolving a conjecture of Subag [Subag, Communications on Pure and Applied Mathematics, 74(5), 2021]. The algorithm is a randomized Hessian ascent in the solid cube, with the objective modified by subtracting an instance-independent potential function [Chen et al., Communications on Pure and Applied Mathematics, 76(7), 2023]. Using tools from free probability theory, we construct an approximate projector into the top eigenspaces of the Hessian, which serves as the covariance matrix for the random increments. With high probability, the iterates' empirical distribution approximates the solution to the primal version of the Auffinger-Chen SDE [Auffinger et al., Communications in Mathematical Physics, 335, 2015]. The per-iterate change in the modified objective is bounded via a Taylor expansion, where the derivatives are controlled through Gaussian concentration bounds and smoothness properties of a semiconcave regularization of the Fenchel-Legendre dual to the Parisi PDE. These results lay the groundwork for (possibly) demonstrating low-degree sum-of-squares certificates over high-entropy step distributions for a relaxed version of the Parisi formula [Open Question 1.8, arXiv:2401.14383].

LGMar 5, 2022
A Robust Spectral Algorithm for Overcomplete Tensor Decomposition

Samuel B. Hopkins, Tselil Schramm, Jonathan Shi

We give a spectral algorithm for decomposing overcomplete order-4 tensors, so long as their components satisfy an algebraic non-degeneracy condition that holds for nearly all (all but an algebraic set of measure $0$) tensors over $(\mathbb{R}^d)^{\otimes 4}$ with rank $n \le d^2$. Our algorithm is robust to adversarial perturbations of bounded spectral norm. Our algorithm is inspired by one which uses the sum-of-squares semidefinite programming hierarchy (Ma, Shi, and Steurer STOC'16, arXiv:1610.01980), and we achieve comparable robustness and overcompleteness guarantees under similar algebraic assumptions. However, our algorithm avoids semidefinite programming and may be implemented as a series of basic linear-algebraic operations. We consequently obtain a much faster running time than semidefinite programming methods: our algorithm runs in time $\tilde O(n^2d^3) \le \tilde O(d^7)$, which is subquadratic in the input size $d^4$ (where we have suppressed factors related to the condition number of the input tensor).

72.3PRMay 5
Potential Hessian Ascent III: Sampling the Sherrington--Kirkpatrick Model at Beta < 1/2

Ewan Davies, Holden Lee, Juspreet Singh Sandhu et al.

We give a polynomial-time algorithm to sample from the Gibbs measure of the Sherrington--Kirkpatrick model with negligible total-variation distance (TVD) error up to inverse temperature $β< 1/2$. Prior work obtained TVD error guarantees only up to $β\approx 0.295$, while results covering the entire replica-symmetric regime $β< 1$ gave guarantees only in Wasserstein distance. Our approach demonstrates that the same potential Hessian ascent previously developed for optimization also functions as a sampling algorithm by implementing algorithmic stochastic localization at high temperature. By estimating the covariance of the tilted Gibbs distribution via Gaussian integration by parts, overlap concentration, and precise cavity estimates, we show that a Hessian-ascent process achieves an $O(1)$ Wasserstein error guarantee for finite-time localization, improving on the previous $o(n)$. A careful comparison of stochastic localization with the Hessian ascent process and a free probability argument controlling the diagonal sub-algebra of the Hessian improves this to $O(1)$ in KL divergence. We then use Jarzynski's equality with rejection sampling, along with a restricted log-Sobolev inequality on the time-$T$ localized distribution, to refine the error to $o(1)$ in TVD up to a constant time $T$ and to complete the sampling with Glauber dynamics.

DSOct 6, 2016
Polynomial-time Tensor Decompositions with Sum-of-Squares

Tengyu Ma, Jonathan Shi, David Steurer

We give new algorithms based on the sum-of-squares method for tensor decomposition. Our results improve the best known running times from quasi-polynomial to polynomial for several problems, including decomposing random overcomplete 3-tensors and learning overcomplete dictionaries with constant relative sparsity. We also give the first robust analysis for decomposing overcomplete 4-tensors in the smoothed analysis model. A key ingredient of our analysis is to establish small spectral gaps in moment matrices derived from solutions to sum-of-squares relaxations. To enable this analysis we augment sum-of-squares relaxations with spectral analogs of maximum entropy constraints.

DSDec 8, 2015
Fast spectral algorithms from sum-of-squares proofs: tensor decomposition and planted sparse vectors

Samuel B. Hopkins, Tselil Schramm, Jonathan Shi et al.

We consider two problems that arise in machine learning applications: the problem of recovering a planted sparse vector in a random linear subspace and the problem of decomposing a random low-rank overcomplete 3-tensor. For both problems, the best known guarantees are based on the sum-of-squares method. We develop new algorithms inspired by analyses of the sum-of-squares method. Our algorithms achieve the same or similar guarantees as sum-of-squares for these problems but the running time is significantly faster. For the planted sparse vector problem, we give an algorithm with running time nearly linear in the input size that approximately recovers a planted sparse vector with up to constant relative sparsity in a random subspace of $\mathbb R^n$ of dimension up to $\tilde Ω(\sqrt n)$. These recovery guarantees match the best known ones of Barak, Kelner, and Steurer (STOC 2014) up to logarithmic factors. For tensor decomposition, we give an algorithm with running time close to linear in the input size (with exponent $\approx 1.086$) that approximately recovers a component of a random 3-tensor over $\mathbb R^n$ of rank up to $\tilde Ω(n^{4/3})$. The best previous algorithm for this problem due to Ge and Ma (RANDOM 2015) works up to rank $\tilde Ω(n^{3/2})$ but requires quasipolynomial time.

LGJul 12, 2015
Tensor principal component analysis via sum-of-squares proofs

Samuel B. Hopkins, Jonathan Shi, David Steurer

We study a statistical model for the tensor principal component analysis problem introduced by Montanari and Richard: Given a order-$3$ tensor $T$ of the form $T = τ\cdot v_0^{\otimes 3} + A$, where $τ\geq 0$ is a signal-to-noise ratio, $v_0$ is a unit vector, and $A$ is a random noise tensor, the goal is to recover the planted vector $v_0$. For the case that $A$ has iid standard Gaussian entries, we give an efficient algorithm to recover $v_0$ whenever $τ\geq ω(n^{3/4} \log(n)^{1/4})$, and certify that the recovered vector is close to a maximum likelihood estimator, all with high probability over the random choice of $A$. The previous best algorithms with provable guarantees required $τ\geq Ω(n)$. In the regime $τ\leq o(n)$, natural tensor-unfolding-based spectral relaxations for the underlying optimization problem break down (in the sense that their integrality gap is large). To go beyond this barrier, we use convex relaxations based on the sum-of-squares method. Our recovery algorithm proceeds by rounding a degree-$4$ sum-of-squares relaxations of the maximum-likelihood-estimation problem for the statistical model. To complement our algorithmic results, we show that degree-$4$ sum-of-squares relaxations break down for $τ\leq O(n^{3/4}/\log(n)^{1/4})$, which demonstrates that improving our current guarantees (by more than logarithmic factors) would require new techniques or might even be intractable. Finally, we show how to exploit additional problem structure in order to solve our sum-of-squares relaxations, up to some approximation, very efficiently. Our fastest algorithm runs in nearly-linear time using shifted (matrix) power iteration and has similar guarantees as above. The analysis of this algorithm also confirms a variant of a conjecture of Montanari and Richard about singular vectors of tensor unfoldings.