MLJun 29, 2022
Theoretical Perspectives on Deep Learning Methods in Inverse ProblemsJonathan Scarlett, Reinhard Heckel, Miguel R. D. Rodrigues et al.
In recent years, there have been significant advances in the use of deep learning methods in inverse problems such as denoising, compressive sensing, inpainting, and super-resolution. While this line of works has predominantly been driven by practical algorithms and experiments, it has also given rise to a variety of intriguing theoretical problems. In this paper, we survey some of the prominent theoretical developments in this line of works, focusing in particular on generative priors, untrained neural network priors, and unfolding algorithms. In addition to summarizing existing results in these topics, we highlight several ongoing challenges and open problems.
NAOct 7, 2013
Stable optimizationless recovery from phaseless linear measurementsLaurent Demanet, Paul Hand
We address the problem of recovering an n-vector from m linear measurements lacking sign or phase information. We show that lifting and semidefinite relaxation suffice by themselves for stable recovery in the setting of m = O(n log n) random sensing vectors, with high probability. The recovery method is optimizationless in the sense that trace minimization in the PhaseLift procedure is unnecessary. That is, PhaseLift reduces to a feasibility problem. The optimizationless perspective allows for a Douglas-Rachford numerical algorithm that is unavailable for PhaseLift. This method exhibits linear convergence with a favorable convergence rate and without any parameter tuning.
LGJun 1, 2022
Analysis of Catastrophic Forgetting for Random Orthogonal Transformation Tasks in the Overparameterized RegimeDaniel Goldfarb, Paul Hand
Overparameterization is known to permit strong generalization performance in neural networks. In this work, we provide an initial theoretical analysis of its effect on catastrophic forgetting in a continual learning setup. We show experimentally that in permuted MNIST image classification tasks, the generalization performance of multilayer perceptrons trained by vanilla stochastic gradient descent can be improved by overparameterization, and the extent of the performance increase achieved by overparameterization is comparable to that of state-of-the-art continual learning algorithms. We provide a theoretical explanation of this effect by studying a qualitatively similar two-task linear regression problem, where each task is related by a random orthogonal transformation. We show that when a model is trained on the two tasks in sequence without any additional regularization, the risk gain on the first task is small if the model is sufficiently overparameterized.
OCMay 28, 2014
Conditions for Existence of Dual Certificates in Rank-One Semidefinite ProblemsPaul Hand
Several signal recovery tasks can be relaxed into semidefinite programs with rank-one minimizers. A common technique for proving these programs succeed is to construct a dual certificate. Unfortunately, dual certificates may not exist under some formulations of semidefinite programs. In order to put problems into a form where dual certificate arguments are possible, it is important to develop conditions under which the certificates exist. In this paper, we provide an example where dual certificates do not exist. We then present a completeness condition under which they are guaranteed to exist. For programs that do not satisfy the completeness condition, we present a completion process which produces an equivalent program that does satisfy the condition. The important message of this paper is that dual certificates may not exist for semidefinite programs that involve orthogonal measurements with respect to positive-semidefinite matrices. Such measurements can interact with the positive-semidefinite constraint in a way that implies additional linear measurements. If these additional measurements are not included in the problem formulation, then dual certificates may fail to exist. As an illustration, we present a semidefinite relaxation for the task of finding the sparsest element in a subspace. One formulation of this program does not admit dual certificates. The completion process produces an equivalent formulation which does admit dual certificates.
LGMar 8, 2022
Regularized Training of Intermediate Layers for Generative Models for Inverse ProblemsSean Gunn, Jorio Cocola, Paul Hand
Generative Adversarial Networks (GANs) have been shown to be powerful and flexible priors when solving inverse problems. One challenge of using them is overcoming representation error, the fundamental limitation of the network in representing any particular signal. Recently, multiple proposed inversion algorithms reduce representation error by optimizing over intermediate layer representations. These methods are typically applied to generative models that were trained agnostic of the downstream inversion algorithm. In our work, we introduce a principle that if a generative model is intended for inversion using an algorithm based on optimization of intermediate layers, it should be trained in a way that regularizes those intermediate layers. We instantiate this principle for two notable recent inversion algorithms: Intermediate Layer Optimization and the Multi-Code GAN prior. For both of these inversion algorithms, we introduce a new regularized GAN training algorithm and demonstrate that the learned generative model results in lower reconstruction errors across a wide range of under sampling ratios when solving compressed sensing, inpainting, and super-resolution problems.
LGJan 23, 2024
The Joint Effect of Task Similarity and Overparameterization on Catastrophic Forgetting -- An Analytical ModelDaniel Goldfarb, Itay Evron, Nir Weinberger et al.
In continual learning, catastrophic forgetting is affected by multiple aspects of the tasks. Previous works have analyzed separately how forgetting is affected by either task similarity or overparameterization. In contrast, our paper examines how task similarity and overparameterization jointly affect forgetting in an analyzable model. Specifically, we focus on two-task continual linear regression, where the second task is a random orthogonal transformation of an arbitrary first task (an abstraction of random permutation tasks). We derive an exact analytical expression for the expected forgetting - and uncover a nuanced pattern. In highly overparameterized models, intermediate task similarity causes the most forgetting. However, near the interpolation threshold, forgetting decreases monotonically with the expected task similarity. We validate our findings with linear regression on synthetic data, and with neural networks on established permutation task benchmarks.
LGFeb 11, 2025
Analysis of Overparameterization in Continual Learning under a Linear ModelDaniel Goldfarb, Paul Hand
Autonomous machine learning systems that learn many tasks in sequence are prone to the catastrophic forgetting problem. Mathematical theory is needed in order to understand the extent of forgetting during continual learning. As a foundational step towards this goal, we study continual learning and catastrophic forgetting from a theoretical perspective in the simple setting of gradient descent with no explicit algorithmic mechanism to prevent forgetting. In this setting, we analytically demonstrate that overparameterization alone can mitigate forgetting in the context of a linear regression model. We consider a two-task setting motivated by permutation tasks, and show that as the overparameterization ratio becomes sufficiently high, a model trained on both tasks in sequence results in a low-risk estimator for the first task. As part of this work, we establish a non-asymptotic bound of the risk of a single linear regression task, which may be of independent interest to the field of double descent theory.
LGMar 7
Latent Generative Models with Tunable Complexity for Compressed Sensing and other Inverse ProblemsSean Gunn, Jorio Cocola, Oliver De Candido et al.
Generative models have emerged as powerful priors for solving inverse problems. These models typically represent a class of natural signals using a single fixed complexity or dimensionality. This can be limiting: depending on the problem, a fixed complexity may result in high representation error if too small, or overfitting to noise if too large. We develop tunable-complexity priors for diffusion models, normalizing flows, and variational autoencoders, leveraging nested dropout. Across tasks including compressed sensing, inpainting, denoising, and phase retrieval, we show empirically that tunable priors consistently achieve lower reconstruction errors than fixed-complexity baselines. In the linear denoising setting, we provide a theoretical analysis that explicitly characterizes how the optimal tuning parameter depends on noise and model structure. This work demonstrates the potential of tunable-complexity generative priors and motivates both the development of supporting theory and their application across a wide range of inverse problems.
LGOct 7, 2021
Score-based Generative Neural Networks for Large-Scale Optimal TransportMara Daniels, Tyler Maunu, Paul Hand
We consider the fundamental problem of sampling the optimal transport coupling between given source and target distributions. In certain cases, the optimal transport plan takes the form of a one-to-one mapping from the source support to the target support, but learning or even approximating such a map is computationally challenging for large and high-dimensional datasets due to the high cost of linear programming routines and an intrinsic curse of dimensionality. We study instead the Sinkhorn problem, a regularized form of optimal transport whose solutions are couplings between the source and the target distribution. We introduce a novel framework for learning the Sinkhorn coupling between two distributions in the form of a score-based generative model. Conditioned on source data, our procedure iterates Langevin Dynamics to sample target data according to the regularized optimal coupling. Key to this approach is a neural network parametrization of the Sinkhorn problem, and we prove convergence of gradient descent with respect to network parameters in this formulation. We demonstrate its empirical success on a variety of large scale optimal transport tasks.
CVFeb 22, 2021
Generator Surgery for Compressed SensingNiklas Smedemark-Margulies, Jung Yeon Park, Max Daniels et al.
Image recovery from compressive measurements requires a signal prior for the images being reconstructed. Recent work has explored the use of deep generative models with low latent dimension as signal priors for such problems. However, their recovery performance is limited by high representation error. We introduce a method for achieving low representation error using generators as signal priors. Using a pre-trained generator, we remove one or more initial blocks at test time and optimize over the new, higher-dimensional latent space to recover a target image. Experiments demonstrate significantly improved reconstruction quality for a variety of network architectures. This approach also works well for out-of-training-distribution images and is competitive with other state-of-the-art methods. Our experiments show that test-time architectural modifications can greatly improve the recovery quality of generator signal priors for compressed sensing.
LGOct 31, 2020
Optimal Sample Complexity of Subgradient Descent for Amplitude Flow via Non-Lipschitz Matrix ConcentrationPaul Hand, Oscar Leong, Vladislav Voroninski
We consider the problem of recovering a real-valued $n$-dimensional signal from $m$ phaseless, linear measurements and analyze the amplitude-based non-smooth least squares objective. We establish local convergence of subgradient descent with optimal sample complexity based on the uniform concentration of a random, discontinuous matrix-valued operator arising from the objective's gradient dynamics. While common techniques to establish uniform concentration of random functions exploit Lipschitz continuity, we prove that the discontinuous matrix-valued operator satisfies a uniform matrix concentration inequality when the measurement vectors are Gaussian as soon as $m = Ω(n)$ with high probability. We then show that satisfaction of this inequality is sufficient for subgradient descent with proper initialization to converge linearly to the true solution up to the global sign ambiguity. As a consequence, this guarantees local convergence for Gaussian measurements at optimal sample complexity. The concentration methods in the present work have previously been used to establish recovery guarantees for a variety of inverse problems under generative neural network priors. This paper demonstrates the applicability of these techniques to more traditional inverse problems and serves as a pedagogical introduction to those results.
ITAug 24, 2020
Compressive Phase Retrieval: Optimal Sample Complexity with Deep Generative PriorsPaul Hand, Oscar Leong, Vladislav Voroninski
Advances in compressive sensing provided reconstruction algorithms of sparse signals from linear measurements with optimal sample complexity, but natural extensions of this methodology to nonlinear inverse problems have been met with potentially fundamental sample complexity bottlenecks. In particular, tractable algorithms for compressive phase retrieval with sparsity priors have not been able to achieve optimal sample complexity. This has created an open problem in compressive phase retrieval: under generic, phaseless linear measurements, are there tractable reconstruction algorithms that succeed with optimal sample complexity? Meanwhile, progress in machine learning has led to the development of new data-driven signal priors in the form of generative models, which can outperform sparsity priors with significantly fewer measurements. In this work, we resolve the open problem in compressive phase retrieval and demonstrate that generative priors can lead to a fundamental advance by permitting optimal sample complexity by a tractable algorithm in this challenging nonlinear inverse problem. We additionally provide empirics showing that exploiting generative priors in phase retrieval can significantly outperform sparsity priors. These results provide support for generative priors as a new paradigm for signal recovery in a variety of contexts, both empirically and theoretically. The strengths of this paradigm are that (1) generative priors can represent some classes of natural signals more concisely than sparsity priors, (2) generative priors allow for direct optimization over the natural signal manifold, which is intractable under sparsity priors, and (3) the resulting non-convex optimization problems with generative priors can admit benign optimization landscapes at optimal sample complexity, perhaps surprisingly, even in cases of nonlinear measurements.
MLJun 14, 2020
Nonasymptotic Guarantees for Spiked Matrix Recovery with Generative PriorsJorio Cocola, Paul Hand, Vladislav Voroninski
Many problems in statistics and machine learning require the reconstruction of a rank-one signal matrix from noisy data. Enforcing additional prior information on the rank-one component is often key to guaranteeing good recovery performance. One such prior on the low-rank component is sparsity, giving rise to the sparse principal component analysis problem. Unfortunately, there is strong evidence that this problem suffers from a computational-to-statistical gap, which may be fundamental. In this work, we study an alternative prior where the low-rank component is in the range of a trained generative network. We provide a non-asymptotic analysis with optimal sample complexity, up to logarithmic factors, for rank-one matrix recovery under an expansive-Gaussian network prior. Specifically, we establish a favorable global optimization landscape for a nonlinear least squares objective, provided the number of samples is on the order of the dimensionality of the input to the generative model. This result suggests that generative priors have no computational-to-statistical gap for structured rank-one matrix recovery in the finite data, nonasymptotic regime. We present this analysis in the case of both the Wishart and Wigner spiked matrix models.
LGJun 14, 2020
Global Convergence of Sobolev Training for Overparameterized Neural NetworksJorio Cocola, Paul Hand
Sobolev loss is used when training a network to approximate the values and derivatives of a target function at a prescribed set of input points. Recent works have demonstrated its successful applications in various tasks such as distillation or synthetic gradient prediction. In this work we prove that an overparameterized two-layer relu neural network trained on the Sobolev loss with gradient flow from random initialization can fit any given function values and any given directional derivatives, under a separation condition on the input data.
LGJan 23, 2020
Reducing the Representation Error of GAN Image Priors Using the Deep DecoderMara Daniels, Paul Hand, Reinhard Heckel
Generative models, such as GANs, learn an explicit low-dimensional representation of a particular class of images, and so they may be used as natural image priors for solving inverse problems such as image restoration and compressive sensing. GAN priors have demonstrated impressive performance on these tasks, but they can exhibit substantial representation error for both in-distribution and out-of-distribution images, because of the mismatch between the learned, approximate image distribution and the data generating distribution. In this paper, we demonstrate a method for reducing the representation error of GAN priors by modeling images as the linear combination of a GAN prior with a Deep Decoder. The deep decoder is an underparameterized and most importantly unlearned natural signal model similar to the Deep Image Prior. No knowledge of the specific inverse problem is needed in the training of the GAN underlying our method. For compressive sensing and image superresolution, our hybrid model exhibits consistently higher PSNRs than both the GAN priors and Deep Decoder separately, both on in-distribution and out-of-distribution images. This model provides a method for extensibly and cheaply leveraging both the benefits of learned and unlearned image recovery priors in inverse problems.
OCMay 29, 2019
Global Guarantees for Blind Demodulation with Generative PriorsPaul Hand, Babhru Joshi
We study a deep learning inspired formulation for the blind demodulation problem, which is the task of recovering two unknown vectors from their entrywise multiplication. We consider the case where the unknown vectors are in the range of known deep generative models, $\mathcal{G}^{(1)}:\mathbb{R}^n\rightarrow\mathbb{R}^\ell$ and $\mathcal{G}^{(2)}:\mathbb{R}^p\rightarrow\mathbb{R}^\ell$. In the case when the networks corresponding to the generative models are expansive, the weight matrices are random and the dimension of the unknown vectors satisfy $\ell = Ω(n^2+p^2)$, up to log factors, we show that the empirical risk objective has a favorable landscape for optimization. That is, the objective function has a descent direction at every point outside of a small neighborhood around four hyperbolic curves. We also characterize the local maximizers of the empirical risk objective and, hence, show that there does not exist any other stationary points outside of these neighborhood around four hyperbolic curves and the set of local maximizers. We also implement a gradient descent scheme inspired by the geometry of the landscape of the objective function. In order to converge to a global minimizer, this gradient descent scheme exploits the fact that exactly one of the hyperbolic curve corresponds to the global minimizer, and thus points near this hyperbolic curve have a lower objective value than points close to the other spurious hyperbolic curves. We show that this gradient descent scheme can effectively remove distortions synthetically introduced to the MNIST dataset.
CVMay 28, 2019
Invertible generative models for inverse problems: mitigating representation error and dataset biasMuhammad Asim, Mara Daniels, Oscar Leong et al.
Trained generative models have shown remarkable performance as priors for inverse problems in imaging -- for example, Generative Adversarial Network priors permit recovery of test images from 5-10x fewer measurements than sparsity priors. Unfortunately, these models may be unable to represent any particular image because of architectural choices, mode collapse, and bias in the training dataset. In this paper, we demonstrate that invertible neural networks, which have zero representation error by design, can be effective natural signal priors at inverse problems such as denoising, compressive sensing, and inpainting. Given a trained generative model, we study the empirical risk formulation of the desired inverse problem under a regularization that promotes high likelihood images, either directly by penalization or algorithmically by initialization. For compressive sensing, invertible priors can yield higher accuracy than sparsity priors across almost all undersampling ratios, and due to their lack of representation error, invertible priors can yield better reconstructions than GAN priors for images that have rare features of variation within the biased training set, including out-of-distribution natural images. We additionally compare performance for compressive sensing to unlearned methods, such as the deep decoder, and we establish theoretical bounds on expected recovery error in the case of a linear invertible model.
CVOct 2, 2018
Deep Decoder: Concise Image Representations from Untrained Non-convolutional NetworksReinhard Heckel, Paul Hand
Deep neural networks, in particular convolutional neural networks, have become highly effective tools for compressing images and solving inverse problems including denoising, inpainting, and reconstruction from few and noisy measurements. This success can be attributed in part to their ability to represent and generate natural images well. Contrary to classical tools such as wavelets, image-generating deep neural networks have a large number of parameters---typically a multiple of their output dimension---and need to be trained on large datasets. In this paper, we propose an untrained simple image model, called the deep decoder, which is a deep neural network that can generate natural images from very few weight parameters. The deep decoder has a simple architecture with no convolutions and fewer weight parameters than the output dimensionality. This underparameterization enables the deep decoder to compress images into a concise set of network weights, which we show is on par with wavelet-based thresholding. Further, underparameterization provides a barrier to overfitting, allowing the deep decoder to have state-of-the-art performance for denoising. The deep decoder is simple in the sense that each layer has an identical structure that consists of only one upsampling unit, pixel-wise linear combination of channels, ReLU activation, and channelwise normalization. This simplicity makes the network amenable to theoretical analysis, and it sheds light on the aspects of neural networks that enable them to form effective signal representations.
ITJul 11, 2018
Phase Retrieval Under a Generative PriorPaul Hand, Oscar Leong, Vladislav Voroninski
The phase retrieval problem asks to recover a natural signal $y_0 \in \mathbb{R}^n$ from $m$ quadratic observations, where $m$ is to be minimized. As is common in many imaging problems, natural signals are considered sparse with respect to a known basis, and the generic sparsity prior is enforced via $\ell_1$ regularization. While successful in the realm of linear inverse problems, such $\ell_1$ methods have encountered possibly fundamental limitations, as no computationally efficient algorithm for phase retrieval of a $k$-sparse signal has been proven to succeed with fewer than $O(k^2\log n)$ generic measurements, exceeding the theoretical optimum of $O(k \log n)$. In this paper, we propose a novel framework for phase retrieval by 1) modeling natural signals as being in the range of a deep generative neural network $G : \mathbb{R}^k \rightarrow \mathbb{R}^n$ and 2) enforcing this prior directly by optimizing an empirical risk objective over the domain of the generator. Our formulation has provably favorable global geometry for gradient methods, as soon as $m = O(kd^2\log n)$, where $d$ is the depth of the network. Specifically, when suitable deterministic conditions on the generator and measurement matrix are met, we construct a descent direction for any point outside of a small neighborhood around the unique global minimizer and its negative multiple, and show that such conditions hold with high probability under Gaussian ensembles of multilayer fully-connected generator networks and measurement matrices. This formulation for structured phase retrieval thus has two advantages over sparsity based methods: 1) deep generative priors can more tightly represent natural signals and 2) information theoretically optimal sample complexity. We corroborate these results with experiments showing that exploiting generative models in phase retrieval tasks outperforms sparse phase retrieval methods.
ITMay 22, 2018
Rate-Optimal Denoising with Deep Neural NetworksReinhard Heckel, Wen Huang, Paul Hand et al.
Deep neural networks provide state-of-the-art performance for image denoising, where the goal is to recover a near noise-free image from a noisy observation. The underlying principle is that neural networks trained on large datasets have empirically been shown to be able to generate natural images well from a low-dimensional latent representation of the image. Given such a generator network, a noisy image can be denoised by i) finding the closest image in the range of the generator or by ii) passing it through an encoder-generator architecture (known as an autoencoder). However, there is little theory to justify this success, let alone to predict the denoising performance as a function of the network parameters. In this paper we consider the problem of denoising an image from additive Gaussian noise using the two generator based approaches. In both cases, we assume the image is well described by a deep neural network with ReLU activations functions, mapping a $k$-dimensional code to an $n$-dimensional image. In the case of the autoencoder, we show that the feedforward network reduces noise energy by a factor of $O(k/n)$. In the case of optimizing over the range of a generative model, we state and analyze a simple gradient algorithm that minimizes a non-convex loss function, and provably reduces noise energy by a factor of $O(k/n)$. We also demonstrate in numerical experiments that this denoising performance is, indeed, achieved by generative priors learned from data.
ITMay 22, 2017
Global Guarantees for Enforcing Deep Generative Priors by Empirical RiskPaul Hand, Vladislav Voroninski
We examine the theoretical properties of enforcing priors provided by generative deep neural networks via empirical risk minimization. In particular we consider two models, one in which the task is to invert a generative neural network given access to its last layer and another in which the task is to invert a generative neural network given only compressive linear observations of its last layer. We establish that in both cases, in suitable regimes of network layer sizes and a randomness assumption on the network weights, that the non-convex objective function given by empirical risk minimization does not have any spurious stationary points. That is, we establish that with high probability, at any point away from small neighborhoods around two scalar multiples of the desired solution, there is a descent direction. Hence, there are no local minima, saddle points, or other stationary points outside these neighborhoods. These results constitute the first theoretical guarantees which establish the favorable global geometry of these non-convex optimization problems, and they bridge the gap between the empirical success of enforcing deep generative priors and a rigorous understanding of non-linear inverse problems.
OCDec 19, 2016
A Convex Program for Mixed Linear Regression with a Recovery Guarantee for Well-Separated DataPaul Hand, Babhru Joshi
We introduce a convex approach for mixed linear regression over $d$ features. This approach is a second-order cone program, based on L1 minimization, which assigns an estimate regression coefficient in $\mathbb{R}^{d}$ for each data point. These estimates can then be clustered using, for example, $k$-means. For problems with two or more mixture classes, we prove that the convex program exactly recovers all of the mixture components in the noiseless setting under technical conditions that include a well-separation assumption on the data. Under these assumptions, recovery is possible if each class has at least $d$ independent measurements. We also explore an iteratively reweighted least squares implementation of this method on real and synthetic data.
CVAug 7, 2016
ShapeFit and ShapeKick for Robust, Scalable Structure from MotionThomas Goldstein, Paul Hand, Choongbum Lee et al.
We introduce a new method for location recovery from pair-wise directions that leverages an efficient convex program that comes with exact recovery guarantees, even in the presence of adversarial outliers. When pairwise directions represent scaled relative positions between pairs of views (estimated for instance with epipolar geometry) our method can be used for location recovery, that is the determination of relative pose up to a single unknown scale. For this task, our method yields performance comparable to the state-of-the-art with an order of magnitude speed-up. Our proposed numerical framework is flexible in that it accommodates other approaches to location recovery and can be used to speed up other methods. These properties are demonstrated by extensively testing against state-of-the-art methods for location recovery on 13 large, irregular collections of images of real scenes in addition to simulated data with ground truth.
CVSep 16, 2015
Exact simultaneous recovery of locations and structure from known orientations and corrupted point correspondencesPaul Hand, Choongbum Lee, Vladislav Voroninski
Let $t_1,\ldots,t_{n_l} \in \mathbb{R}^d$ and $p_1,\ldots,p_{n_s} \in \mathbb{R}^d$ and consider the bipartite location recovery problem: given a subset of pairwise direction observations $\{(t_i - p_j) / \|t_i - p_j\|_2\}_{i,j \in [n_l] \times [n_s]}$, where a constant fraction of these observations are arbitrarily corrupted, find $\{t_i\}_{i \in [n_ll]}$ and $\{p_j\}_{j \in [n_s]}$ up to a global translation and scale. We study the recently introduced ShapeFit algorithm as a method for solving this bipartite location recovery problem. In this case, ShapeFit consists of a simple convex program over $d(n_l + n_s)$ real variables. We prove that this program recovers a set of $n_l+n_s$ i.i.d. Gaussian locations exactly and with high probability if the observations are given by a bipartite Erdős-Rényi graph, $d$ is large enough, and provided that at most a constant fraction of observations involving any particular location are adversarially corrupted. This recovery theorem is based on a set of deterministic conditions that we prove are sufficient for exact recovery. Finally, we propose a modified pipeline for the Structure for Motion problem, based on this bipartite location recovery problem.
CVJun 4, 2015
ShapeFit: Exact location recovery from corrupted pairwise directionsPaul Hand, Choongbum Lee, Vladislav Voroninski
Let $t_1,\ldots,t_n \in \mathbb{R}^d$ and consider the location recovery problem: given a subset of pairwise direction observations $\{(t_i - t_j) / \|t_i - t_j\|_2\}_{i<j \in [n] \times [n]}$, where a constant fraction of these observations are arbitrarily corrupted, find $\{t_i\}_{i=1}^n$ up to a global translation and scale. We propose a novel algorithm for the location recovery problem, which consists of a simple convex program over $dn$ real variables. We prove that this program recovers a set of $n$ i.i.d. Gaussian locations exactly and with high probability if the observations are given by an \erdosrenyi graph, $d$ is large enough, and provided that at most a constant fraction of observations involving any particular location are adversarially corrupted. We also prove that the program exactly recovers Gaussian locations for $d=3$ if the fraction of corrupted observations at each location is, up to poly-logarithmic factors, at most a constant. Both of these recovery theorems are based on a set of deterministic conditions that we prove are sufficient for exact recovery.