OCApr 10, 2018
The non-convex Burer-Monteiro approach works on smooth semidefinite programsNicolas Boumal, Vladislav Voroninski, Afonso S. Bandeira
Semidefinite programs (SDPs) can be solved in polynomial time by interior point methods, but scalability can be an issue. To address this shortcoming, over a decade ago, Burer and Monteiro proposed to solve SDPs with few equality constraints via rank-restricted, non-convex surrogates. Remarkably, for some applications, local optimization methods seem to converge to global optima of these non-convex surrogates reliably. Although some theory supports this empirical success, a complete explanation of it remains an open question. In this paper, we consider a class of SDPs which includes applications such as max-cut, community detection in the stochastic block model, robust PCA, phase retrieval and synchronization of rotations. We show that the low-rank Burer--Monteiro formulation of SDPs in that class almost never has any spurious local optima.
OCMay 28, 2019
Deterministic guarantees for Burer-Monteiro factorizations of smooth semidefinite programsNicolas Boumal, Vladislav Voroninski, Afonso S. Bandeira
We consider semidefinite programs (SDPs) with equality constraints. The variable to be optimized is a positive semidefinite matrix $X$ of size $n$. Following the Burer--Monteiro approach, we optimize a factor $Y$ of size $n \times p$ instead, such that $X = YY^T$. This ensures positive semidefiniteness at no cost and can reduce the dimension of the problem if $p$ is small, but results in a non-convex optimization problem with a quadratic cost function and quadratic equality constraints in $Y$. In this paper, we show that if the set of constraints on $Y$ regularly defines a smooth manifold, then, despite non-convexity, first- and second-order necessary optimality conditions are also sufficient, provided $p$ is large enough. For smaller values of $p$, we show a similar result holds for almost all (linear) cost functions. Under those conditions, a global optimum $Y$ maps to a global optimum $X = YY^T$ of the SDP. We deduce old and new consequences for SDP relaxations of the generalized eigenvector problem, the trust-region subproblem and quadratic optimization over several spheres, as well as for the Max-Cut and Orthogonal-Cut SDPs which are common relaxations in stochastic block modeling and synchronization of rotations.
DIS-NNFeb 27, 2023
Injectivity of ReLU networks: perspectives from statistical physicsAntoine Maillard, Afonso S. Bandeira, David Belius et al.
When can the input of a ReLU neural network be inferred from its output? In other words, when is the network injective? We consider a single layer, $x \mapsto \mathrm{ReLU}(Wx)$, with a random Gaussian $m \times n$ matrix $W$, in a high-dimensional setting where $n, m \to \infty$. Recent work connects this problem to spherical integral geometry giving rise to a conjectured sharp injectivity threshold for $α= \frac{m}{n}$ by studying the expected Euler characteristic of a certain random set. We adopt a different perspective and show that injectivity is equivalent to a property of the ground state of the spherical perceptron, an important spin glass model in statistical physics. By leveraging the (non-rigorous) replica symmetry-breaking theory, we derive analytical equations for the threshold whose solution is at odds with that from the Euler characteristic. Furthermore, we use Gordon's min--max theorem to prove that a replica-symmetric upper bound refutes the Euler characteristic prediction. Along the way we aim to give a tutorial-style introduction to key ideas from statistical physics in an effort to make the exposition accessible to a broad audience. Our analysis establishes a connection between spin glasses and integral geometry but leaves open the problem of explaining the discrepancies.
PRJul 3, 2023
Fitting an ellipsoid to a quadratic number of random pointsAfonso S. Bandeira, Antoine Maillard, Shahar Mendelson et al.
We consider the problem $(\mathrm{P})$ of fitting $n$ standard Gaussian random vectors in $\mathbb{R}^d$ to the boundary of a centered ellipsoid, as $n, d \to \infty$. This problem is conjectured to have a sharp feasibility transition: for any $\varepsilon > 0$, if $n \leq (1 - \varepsilon) d^2 / 4$ then $(\mathrm{P})$ has a solution with high probability, while $(\mathrm{P})$ has no solutions with high probability if $n \geq (1 + \varepsilon) d^2 /4$. So far, only a trivial bound $n \geq d^2 / 2$ is known on the negative side, while the best results on the positive side assume $n \leq d^2 / \mathrm{polylog}(d)$. In this work, we improve over previous approaches using a key result of Bartl & Mendelson (2022) on the concentration of Gram matrices of random vectors under mild assumptions on their tail behavior. This allows us to give a simple proof that $(\mathrm{P})$ is feasible with high probability when $n \leq d^2 / C$, for a (possibly large) constant $C > 0$.
PRMar 31
Randomstrasse101: Open Problems of 2025Afonso S. Bandeira, Daniil Dmitriev, Kevin Lucca et al.
Randomstrasse101 is a blog dedicated to Open Problems in Mathematics, with a focus on Probability Theory, Computation, Combinatorics, Statistics, and related topics. This manuscript serves as a stable record of the Open Problems posted in 2025, with the goal of easing academic referencing. The blog can currently be accessed at randomstrasse101.math.ethz.ch
OCFeb 2, 2021
Community Detection with a Subsampled Semidefinite ProgramPedro Abdalla, Afonso S. Bandeira
Semidefinite programming is an important tool to tackle several problems in data science and signal processing, including clustering and community detection. However, semidefinite programs are often slow in practice, so speed up techniques such as sketching are often considered. In the context of community detection in the stochastic block model, Mixon and Xie \cite{mixon2020sketching} have recently proposed a sketching framework in which a semidefinite program is solved only on a subsampled subgraph of the network, giving rise to significant computational savings. In this short paper, we provide a positive answer to a conjecture of Mixon and Xie about the statistical limits of this technique for the stochastic block model with two balanced communities.
STMay 22, 2020
The Average-Case Time Complexity of Certifying the Restricted Isometry PropertyYunzi Ding, Dmitriy Kunisky, Alexander S. Wein et al.
In compressed sensing, the restricted isometry property (RIP) on $M \times N$ sensing matrices (where $M < N$) guarantees efficient reconstruction of sparse vectors. A matrix has the $(s,δ)$-$\mathsf{RIP}$ property if behaves as a $δ$-approximate isometry on $s$-sparse vectors. It is well known that an $M\times N$ matrix with i.i.d. $\mathcal{N}(0,1/M)$ entries is $(s,δ)$-$\mathsf{RIP}$ with high probability as long as $s\lesssim δ^2 M/\log N$. On the other hand, most prior works aiming to deterministically construct $(s,δ)$-$\mathsf{RIP}$ matrices have failed when $s \gg \sqrt{M}$. An alternative way to find an RIP matrix could be to draw a random gaussian matrix and certify that it is indeed RIP. However, there is evidence that this certification task is computationally hard when $s \gg \sqrt{M}$, both in the worst case and the average case. In this paper, we investigate the exact average-case time complexity of certifying the RIP property for $M\times N$ matrices with i.i.d. $\mathcal{N}(0,1/M)$ entries, in the "possible but hard" regime $\sqrt{M} \ll s\lesssim M/\log N$. Based on analysis of the low-degree likelihood ratio, we give rigorous evidence that subexponential runtime $N^{\tildeΩ(s^2/M)}$ is required, demonstrating a smooth tradeoff between the maximum tolerated sparsity and the required computational power. This lower bound is essentially tight, matching the runtime of an existing algorithm due to Koiran and Zouzias. Our hardness result allows $δ$ to take any constant value in $(0,1)$, which captures the relevant regime for compressed sensing. This improves upon the existing average-case hardness result of Wang, Berthet, and Plan, which is limited to $δ= o(1)$.
STMay 21, 2020
Computationally efficient sparse clusteringMatthias Löffler, Alexander S. Wein, Afonso S. Bandeira
We study statistical and computational limits of clustering when the means of the centres are sparse and their dimension is possibly much larger than the sample size. Our theoretical analysis focuses on the model $X_i = z_i θ+ \varepsilon_i, ~z_i \in \{-1,1\}, ~\varepsilon_i \thicksim \mathcal{N}(0,I)$, which has two clusters with centres $θ$ and $-θ$. We provide a finite sample analysis of a new sparse clustering algorithm based on sparse PCA and show that it achieves the minimax optimal misclustering rate in the regime $\|θ\| \rightarrow \infty$. Our results require the sparsity to grow slower than the square root of the sample size. Using a recent framework for computational lower bounds -- the low-degree likelihood ratio -- we give evidence that this condition is necessary for any polynomial-time clustering algorithm to succeed below the BBP threshold. This complements existing evidence based on reductions and statistical query lower bounds. Compared to these existing results, we cover a wider set of parameter regimes and give a more precise understanding of the runtime required and the misclustering error achievable. Our results imply that a large class of tests based on low-degree polynomials fail to solve even the weak testing task.
OCAug 15, 2019
Experimental performance of graph neural networks on random instances of max-cutWeichi Yao, Afonso S. Bandeira, Soledad Villar
This note explores the applicability of unsupervised machine learning techniques towards hard optimization problems on random inputs. In particular we consider Graph Neural Networks (GNNs) -- a class of neural networks designed to learn functions on graphs -- and we apply them to the max-cut problem on random regular graphs. We focus on the max-cut problem on random regular graphs because it is a fundamental problem that has been widely studied. In particular, even though there is no known explicit solution to compare the output of our algorithm to, we can leverage the known asymptotics of the optimal max-cut value in order to evaluate the performance of the GNNs. In order to put the performance of the GNNs in context, we compare it with the classical semidefinite relaxation approach by Goemans and Williamson~(SDP), and with extremal optimization, which is a local optimization heuristic from the statistical physics literature. The numerical results we obtain indicate that, surprisingly, Graph Neural Networks attain comparable performance to the Goemans and Williamson SDP. We also observe that extremal optimization consistently outperforms the other two methods. Furthermore, the performances of the three methods present similar patterns, that is, for sparser, and for larger graphs, the size of the found cuts are closer to the asymptotic optimal max-cut value.
STJul 26, 2019
Notes on Computational Hardness of Hypothesis Testing: Predictions using the Low-Degree Likelihood RatioDmitriy Kunisky, Alexander S. Wein, Afonso S. Bandeira
These notes survey and explore an emerging method, which we call the low-degree method, for predicting and understanding statistical-versus-computational tradeoffs in high-dimensional inference problems. In short, the method posits that a certain quantity -- the second moment of the low-degree likelihood ratio -- gives insight into how much computational time is required to solve a given hypothesis testing problem, which can in turn be used to predict the computational hardness of a variety of statistical inference tasks. While this method originated in the study of the sum-of-squares (SoS) hierarchy of convex programs, we present a self-contained introduction that does not require knowledge of SoS. In addition to showing how to carry out predictions using the method, we include a discussion investigating both rigorous and conjectural consequences of these predictions. These notes include some new results, simplified proofs, and refined conjectures. For instance, we point out a formal connection between spectral methods and the low-degree likelihood ratio, and we give a sharp low-degree lower bound against subexponential-time algorithms for tensor PCA.
STJul 26, 2019
Subexponential-Time Algorithms for Sparse PCAYunzi Ding, Dmitriy Kunisky, Alexander S. Wein et al.
We study the computational cost of recovering a unit-norm sparse principal component $x \in \mathbb{R}^n$ planted in a random matrix, in either the Wigner or Wishart spiked model (observing either $W + λxx^\top$ with $W$ drawn from the Gaussian orthogonal ensemble, or $N$ independent samples from $\mathcal{N}(0, I_n + βxx^\top)$, respectively). Prior work has shown that when the signal-to-noise ratio ($λ$ or $β\sqrt{N/n}$, respectively) is a small constant and the fraction of nonzero entries in the planted vector is $\|x\|_0 / n = ρ$, it is possible to recover $x$ in polynomial time if $ρ\lesssim 1/\sqrt{n}$. While it is possible to recover $x$ in exponential time under the weaker condition $ρ\ll 1$, it is believed that polynomial-time recovery is impossible unless $ρ\lesssim 1/\sqrt{n}$. We investigate the precise amount of time required for recovery in the "possible but hard" regime $1/\sqrt{n} \ll ρ\ll 1$ by exploring the power of subexponential-time algorithms, i.e., algorithms running in time $\exp(n^δ)$ for some constant $δ\in (0,1)$. For any $1/\sqrt{n} \ll ρ\ll 1$, we give a recovery algorithm with runtime roughly $\exp(ρ^2 n)$, demonstrating a smooth tradeoff between sparsity and runtime. Our family of algorithms interpolates smoothly between two existing algorithms: the polynomial-time diagonal thresholding algorithm and the $\exp(ρn)$-time exhaustive search algorithm. Furthermore, by analyzing the low-degree likelihood ratio, we give rigorous evidence suggesting that the tradeoff achieved by our algorithms is optimal.
PRJul 8, 2018
Stochastic Block Model for Hypergraphs: Statistical limits and a semidefinite programming approachChiheon Kim, Afonso S. Bandeira, Michel X. Goemans
We study the problem of community detection in a random hypergraph model which we call the stochastic block model for $k$-uniform hypergraphs ($k$-SBM). We investigate the exact recovery problem in $k$-SBM and show that a sharp phase transition occurs around a threshold: below the threshold it is impossible to recover the communities with non-vanishing probability, yet above the threshold there is an estimator which recovers the communities almost asymptotically surely. We also consider a simple, efficient algorithm for the exact recovery problem which is based on a semidefinite relaxation technique.
STJul 2, 2018
Optimality and Sub-optimality of PCA I: Spiked Random Matrix ModelsAmelia Perry, Alexander S. Wein, Afonso S. Bandeira et al.
A central problem of random matrix theory is to understand the eigenvalues of spiked random matrix models, introduced by Johnstone, in which a prominent eigenvector (or "spike") is planted into a random matrix. These distributions form natural statistical models for principal component analysis (PCA) problems throughout the sciences. Baik, Ben Arous and Peche showed that the spiked Wishart ensemble exhibits a sharp phase transition asymptotically: when the spike strength is above a critical threshold, it is possible to detect the presence of a spike based on the top eigenvalue, and below the threshold the top eigenvalue provides no information. Such results form the basis of our understanding of when PCA can detect a low-rank signal in the presence of noise. However, under structural assumptions on the spike, not all information is necessarily contained in the spectrum. We study the statistical limits of tests for the presence of a spike, including non-spectral tests. Our results leverage Le Cam's notion of contiguity, and include: i) For the Gaussian Wigner ensemble, we show that PCA achieves the optimal detection threshold for certain natural priors for the spike. ii) For any non-Gaussian Wigner ensemble, PCA is sub-optimal for detection. However, an efficient variant of PCA achieves the optimal threshold (for natural priors) by pre-transforming the matrix entries. iii) For the Gaussian Wishart ensemble, the PCA threshold is optimal for positive spikes (for natural priors) but this is not always the case for negative spikes.
MLMar 29, 2018
Notes on computational-to-statistical gaps: predictions using statistical physicsAfonso S. Bandeira, Amelia Perry, Alexander S. Wein
In these notes we describe heuristics to predict computational-to-statistical gaps in certain statistical problems. These are regimes in which the underlying statistical problem is information-theoretically possible although no efficient algorithm exists, rendering the problem essentially unsolvable for large instances. The methods we describe here are based on mature, albeit non-rigorous, tools from statistical physics. These notes are based on a lecture series given by the authors at the Courant Institute of Mathematical Sciences in New York City, on May 16th, 2017.
OCFeb 18, 2018
Spurious Valleys in Two-layer Neural Network Optimization LandscapesLuca Venturi, Afonso S. Bandeira, Joan Bruna
Neural networks provide a rich class of high-dimensional, non-convex optimization problems. Despite their non-convexity, gradient-descent methods often successfully optimize these models. This has motivated a recent spur in research attempting to characterize properties of their loss surface that may explain such success. In this paper, we address this phenomenon by studying a key topological property of the loss: the presence or absence of spurious valleys, defined as connected components of sub-level sets that do not include a global minimum. Focusing on a class of two-layer neural networks defined by smooth (but generally non-linear) activation functions, we identify a notion of intrinsic dimension and show that it provides necessary and sufficient conditions for the absence of spurious valleys. More concretely, finite intrinsic dimension guarantees that for sufficiently overparametrised models no spurious valleys exist, independently of the data distribution. Conversely, infinite intrinsic dimension implies that spurious valleys do exist for certain data distributions, independently of model overparametrisation. Besides these positive and negative results, we show that, although spurious valleys may exist in general, they are confined to low risk levels and avoided with high probability on overparametrised models.
MLJun 22, 2017
Revised Note on Learning Algorithms for Quadratic Assignment with Graph Neural NetworksAlex Nowak, Soledad Villar, Afonso S. Bandeira et al.
Inverse problems correspond to a certain type of optimization problems formulated over appropriate input distributions. Recently, there has been a growing interest in understanding the computational hardness of these optimization problems, not only in the worst case, but in an average-complexity sense under this same input distribution. In this revised note, we are interested in studying another aspect of hardness, related to the ability to learn how to solve a problem by simply observing a collection of previously solved instances. These 'planted solutions' are used to supervise the training of an appropriate predictive model that parametrizes a broad class of algorithms, with the hope that the resulting model will provide good accuracy-complexity tradeoffs in the average sense. We illustrate this setup on the Quadratic Assignment Problem, a fundamental problem in Network Science. We observe that data-driven models based on Graph Neural Networks offer intriguingly good performance, even in regimes where standard relaxation based techniques appear to suffer.
PRDec 22, 2016
Statistical limits of spiked tensor modelsAmelia Perry, Alexander S. Wein, Afonso S. Bandeira
We study the statistical limits of both detecting and estimating a rank-one deformation of a symmetric random Gaussian tensor. We establish upper and lower bounds on the critical signal-to-noise ratio, under a variety of priors for the planted vector: (i) a uniformly sampled unit vector, (ii) i.i.d. $\pm 1$ entries, and (iii) a sparse vector where a constant fraction $ρ$ of entries are i.i.d. $\pm 1$ and the rest are zero. For each of these cases, our upper and lower bounds match up to a $1+o(1)$ factor as the order $d$ of the tensor becomes large. For sparse signals (iii), our bounds are also asymptotically tight in the sparse limit $ρ\to 0$ for any fixed $d$ (including the $d=2$ case of sparse PCA). Our upper bounds for (i) demonstrate a phenomenon reminiscent of the work of Baik, Ben Arous and Péché: an `eigenvalue' of a perturbed tensor emerges from the bulk at a strictly lower signal-to-noise ratio than when the perturbation itself exceeds the bulk; we quantify the size of this effect. We also provide some general results for larger classes of priors. In particular, the large $d$ asymptotics of the threshold location differs between problems with discrete priors versus continuous priors. Finally, for priors (i) and (ii) we carry out the replica prediction from statistical physics, which is conjectured to give the exact information-theoretic threshold for any fixed $d$. Of independent interest, we introduce a new improvement to the second moment method for contiguity, on which our lower bounds are based. Our technique conditions away from rare `bad' events that depend on interactions between the signal and noise. This enables us to close $\sqrt{2}$-factor gaps present in several previous works.
RODec 21, 2016
SE-Sync: A Certifiably Correct Algorithm for Synchronization over the Special Euclidean GroupDavid M. Rosen, Luca Carlone, Afonso S. Bandeira et al.
Many important geometric estimation problems take the form of synchronization over the special Euclidean group: estimate the values of a set of poses given a set of relative measurements between them. This problem is typically formulated as a nonconvex maximum-likelihood estimation that is computationally hard to solve in general. Nevertheless, in this paper we present an algorithm that is able to efficiently recover certifiably globally optimal solutions of the special Euclidean synchronization problem in a non-adversarial noise regime. The crux of our approach is the development of a semidefinite relaxation of the maximum-likelihood estimation whose minimizer provides an exact MLE so long as the magnitude of the noise corrupting the available measurements falls below a certain critical threshold; furthermore, whenever exactness obtains, it is possible to verify this fact a posteriori, thereby certifying the optimality of the recovered estimate. We develop a specialized optimization scheme for solving large-scale instances of this relaxation by exploiting its low-rank, geometric, and graph-theoretic structure to reduce it to an equivalent optimization problem on a low-dimensional Riemannian manifold, and design a truncated-Newton trust-region method to solve this reduction efficiently. Finally, we combine this fast optimization approach with a simple rounding procedure to produce our algorithm, SE-Sync. Experimental evaluation on a variety of simulated and real-world pose-graph SLAM datasets shows that SE-Sync is able to recover certifiably globally optimal solutions when the available measurements are corrupted by noise up to an order of magnitude greater than that typically encountered in robotics and computer vision applications, and does so more than an order of magnitude faster than the Gauss-Newton-based approach that forms the basis of current state-of-the-art techniques.
RONov 1, 2016
A Certifiably Correct Algorithm for Synchronization over the Special Euclidean GroupDavid M. Rosen, Luca Carlone, Afonso S. Bandeira et al.
Many geometric estimation problems take the form of synchronization over the special Euclidean group: estimate the values of a set of poses given noisy measurements of a subset of their pairwise relative transforms. This problem is typically formulated as a maximum-likelihood estimation that requires solving a nonconvex nonlinear program, which is computationally intractable in general. Nevertheless, in this paper we present an algorithm that is able to efficiently recover certifiably globally optimal solutions of this estimation problem in a non-adversarial noise regime. The crux of our approach is the development of a semidefinite relaxation of the maximum-likelihood estimation whose minimizer provides the exact MLE so long as the magnitude of the noise corrupting the available measurements falls below a certain critical threshold; furthermore, whenever exactness obtains, it is possible to verify this fact a posteriori, thereby certifying the optimality of the recovered estimate. We develop a specialized optimization scheme for solving large-scale instances of this semidefinite relaxation by exploiting its low-rank, geometric, and graph-theoretic structure to reduce it to an equivalent optimization problem on a low-dimensional Riemannian manifold, and then design a Riemannian truncated-Newton trust-region method to solve this reduction efficiently. We combine this fast optimization approach with a simple rounding procedure to produce our algorithm, SE-Sync. Experimental evaluation on a variety of simulated and real-world pose-graph SLAM datasets shows that SE-Sync is capable of recovering globally optimal solutions when the available measurements are corrupted by noise up to an order of magnitude greater than that typically encountered in robotics applications, and does so at a computational cost that scales comparably with that of direct Newton-type local search techniques.
GTOct 17, 2016
A polynomial-time relaxation of the Gromov-Hausdorff distanceSoledad Villar, Afonso S. Bandeira, Andrew J. Blumberg et al.
The Gromov-Hausdorff distance provides a metric on the set of isometry classes of compact metric spaces. Unfortunately, computing this metric directly is believed to be computationally intractable. Motivated by applications in shape matching and point-cloud comparison, we study a semidefinite programming relaxation of the Gromov-Hausdorff metric. This relaxation can be computed in polynomial time, and somewhat surprisingly is itself a pseudometric. We describe the induced topology on the set of compact metric spaces. Finally, we demonstrate the numerical performance of various algorithms for computing the relaxed distance and apply these algorithms to several relevant data sets. In particular we propose a greedy algorithm for finding the best correspondence between finite metric spaces that can handle hundreds of points.
ITOct 14, 2016
Message-passing algorithms for synchronization problems over compact groupsAmelia Perry, Alexander S. Wein, Afonso S. Bandeira et al.
Various alignment problems arising in cryo-electron microscopy, community detection, time synchronization, computer vision, and other fields fall into a common framework of synchronization problems over compact groups such as Z/L, U(1), or SO(3). The goal of such problems is to estimate an unknown vector of group elements given noisy relative observations. We present an efficient iterative algorithm to solve a large class of these problems, allowing for any compact group, with measurements on multiple 'frequency channels' (Fourier modes, or more generally, irreducible representations of the group). Our algorithm is a highly efficient iterative method following the blueprint of approximate message passing (AMP), which has recently arisen as a central technique for inference problems such as structured low-rank estimation and compressed sensing. We augment the standard ideas of AMP with ideas from representation theory so that the algorithm can work with distributions over compact groups. Using standard but non-rigorous methods from statistical physics we analyze the behavior of our algorithm on a Gaussian noise model, identifying phases where the problem is easy, (computationally) hard, and (statistically) impossible. In particular, such evidence predicts that our algorithm is information-theoretically optimal in many cases, and that the remaining cases show evidence of statistical-to-computational gaps.
STSep 19, 2016
Optimality and Sub-optimality of PCA for Spiked Random Matrices and SynchronizationAmelia Perry, Alexander S. Wein, Afonso S. Bandeira et al.
A central problem of random matrix theory is to understand the eigenvalues of spiked random matrix models, in which a prominent eigenvector is planted into a random matrix. These distributions form natural statistical models for principal component analysis (PCA) problems throughout the sciences. Baik, Ben Arous and Péché showed that the spiked Wishart ensemble exhibits a sharp phase transition asymptotically: when the signal strength is above a critical threshold, it is possible to detect the presence of a spike based on the top eigenvalue, and below the threshold the top eigenvalue provides no information. Such results form the basis of our understanding of when PCA can detect a low-rank signal in the presence of noise. However, not all the information about the spike is necessarily contained in the spectrum. We study the fundamental limitations of statistical methods, including non-spectral ones. Our results include: I) For the Gaussian Wigner ensemble, we show that PCA achieves the optimal detection threshold for a variety of benign priors for the spike. We extend previous work on the spherically symmetric and i.i.d. Rademacher priors through an elementary, unified analysis. II) For any non-Gaussian Wigner ensemble, we show that PCA is always suboptimal for detection. However, a variant of PCA achieves the optimal threshold (for benign priors) by pre-transforming the matrix entries according to a carefully designed function. This approach has been stated before, and we give a rigorous and general analysis. III) For both the Gaussian Wishart ensemble and various synchronization problems over groups, we show that inefficient procedures can work below the threshold where PCA succeeds, whereas no known efficient algorithm achieves this. This conjectural gap between what is statistically possible and what can be done efficiently remains open.
DSJul 8, 2015
Multisection in the Stochastic Block Model using Semidefinite ProgrammingNaman Agarwal, Afonso S. Bandeira, Konstantinos Koiliaris et al.
We consider the problem of identifying underlying community-like structures in graphs. Towards this end we study the Stochastic Block Model (SBM) on $k$-clusters: a random model on $n=km$ vertices, partitioned in $k$ equal sized clusters, with edges sampled independently across clusters with probability $q$ and within clusters with probability $p$, $p>q$. The goal is to recover the initial "hidden" partition of $[n]$. We study semidefinite programming (SDP) based algorithms in this context. In the regime $p = \frac{α\log(m)}{m}$ and $q = \frac{β\log(m)}{m}$ we show that a certain natural SDP based algorithm solves the problem of {\em exact recovery} in the $k$-community SBM, with high probability, whenever $\sqrtα - \sqrtβ > \sqrt{1}$, as long as $k=o(\log n)$. This threshold is known to be the information theoretically optimal. We also study the case when $k=θ(\log(n))$. In this case however we achieve recovery guarantees that no longer match the optimal condition $\sqrtα - \sqrtβ > \sqrt{1}$, thus leaving achieving optimality for this range an open question.
CVMay 14, 2015
Non-unique games over compact groups and orientation estimation in cryo-EMAfonso S. Bandeira, Yutong Chen, Amit Singer
Let $\mathcal{G}$ be a compact group and let $f_{ij} \in L^2(\mathcal{G})$. We define the Non-Unique Games (NUG) problem as finding $g_1,\dots,g_n \in \mathcal{G}$ to minimize $\sum_{i,j=1}^n f_{ij} \left( g_i g_j^{-1}\right)$. We devise a relaxation of the NUG problem to a semidefinite program (SDP) by taking the Fourier transform of $f_{ij}$ over $\mathcal{G}$, which can then be solved efficiently. The NUG framework can be seen as a generalization of the little Grothendieck problem over the orthogonal group and the Unique Games problem and includes many practically relevant problems, such as the maximum likelihood estimator} to registering bandlimited functions over the unit sphere in $d$-dimensions and orientation estimation in cryo-Electron Microscopy.
MLAug 18, 2014
Relax, no need to round: integrality of clustering formulationsPranjal Awasthi, Afonso S. Bandeira, Moses Charikar et al.
We study exact recovery conditions for convex relaxations of point cloud clustering problems, focusing on two of the most common optimization problems for unsupervised clustering: $k$-means and $k$-median clustering. Motivations for focusing on convex relaxations are: (a) they come with a certificate of optimality, and (b) they are generic tools which are relatively parameter-free, not tailored to specific assumptions over the input. More precisely, we consider the distributional setting where there are $k$ clusters in $\mathbb{R}^m$ and data from each cluster consists of $n$ points sampled from a symmetric distribution within a ball of unit radius. We ask: what is the minimal separation distance between cluster centers needed for convex relaxations to exactly recover these $k$ clusters as the optimal integral solution? For the $k$-median linear programming relaxation we show a tight bound: exact recovery is obtained given arbitrarily small pairwise separation $ε> 0$ between the balls. In other words, the pairwise center separation is $Δ> 2+ε$. Under the same distributional model, the $k$-means LP relaxation fails to recover such clusters at separation as large as $Δ= 4$. Yet, if we enforce PSD constraints on the $k$-means LP, we get exact cluster recovery at center separation $Δ> 2\sqrt2(1+\sqrt{1/m})$. In contrast, common heuristics such as Lloyd's algorithm (a.k.a. the $k$-means algorithm) can fail to recover clusters in this setting; even with arbitrarily large cluster separation, k-means++ with overseeding by any constant factor fails with high probability at exact cluster recovery. To complement the theoretical analysis, we provide an experimental study of the recovery guarantees for these various methods, and discuss several open problems which these experiments suggest.
LGApr 11, 2014
Compressive classification and the rare eclipse problemAfonso S. Bandeira, Dustin G. Mixon, Benjamin Recht
This paper addresses the fundamental question of when convex sets remain disjoint after random projection. We provide an analysis using ideas from high-dimensional convex geometry. For ellipsoids, we provide a bound in terms of the distance between these ellipsoids and simple functions of their polynomial coefficients. As an application, this theorem provides bounds for compressive classification of convex sets. Rather than assuming that the data to be classified is sparse, our results show that the data can be acquired via very few measurements yet will remain linearly separable. We demonstrate the feasibility of this approach in the context of hyperspectral imaging.
OCApr 10, 2014
Open problem: Tightness of maximum likelihood semidefinite relaxationsAfonso S. Bandeira, Yuehaw Khoo, Amit Singer
We have observed an interesting, yet unexplained, phenomenon: Semidefinite programming (SDP) based relaxations of maximum likelihood estimators (MLE) tend to be tight in recovery problems with noisy data, even when MLE cannot exactly recover the ground truth. Several results establish tightness of SDP based relaxations in the regime where exact recovery from MLE is possible. However, to the best of our knowledge, their tightness is not understood beyond this regime. As an illustrative example, we focus on the generalized Procrustes problem.