97.7MLJun 4
Diffusion Models Observe Only Gradients: A Geometric Perspective on Score Matching ErrorsNaïl B. Khelifa, Richard E. Turner, Ramji Venkataramanan
Score-based diffusion models are typically trained by minimizing the $L^2$ score matching error, and standard theoretical analyses rely on this quantity to bound the sampling discrepancy between the learned and target distributions. We show the $L^2$ score error is not the right intrinsic measure of marginal distributional quality: a learned diffusion model can incur arbitrarily large $L^2$ score error while perfectly matching the target distribution. By decomposing score errors into a gradient and a solenoidal component (a Helmholtz-Hodge decomposition), we identify the geometric reason behind this: only the gradient component enters the marginal Fokker-Planck dynamics, while the solenoidal component is structurally invisible. We make this precise in three results. First, building on the corrected geometry, we prove an impossibility result: no monotone function of the $L^2$ score error can uniformly lower bound any divergence between the learned and target distributions. Second, we derive an upper bound on the Kullback-Leibler divergence that depends only on the observable gradient component of the error, tightening the standard Girsanov bound and identifying its looseness as the cost of operating on path-space rather than marginal-space dynamics. Third, we give a tractable estimator of the gradient component via a dual Sobolev identity, which is shown to empirically correlate substantially better with sample quality than the full $L^2$ error.
STNov 21, 2022
Precise Asymptotics for Spectral Methods in Mixed Generalized Linear ModelsYihan Zhang, Marco Mondelli, Ramji Venkataramanan
In a mixed generalized linear model, the goal is to learn multiple signals from unlabeled observations: each sample comes from exactly one signal, but it is not known which one. We consider the prototypical problem of estimating two statistically independent signals in a mixed generalized linear model with Gaussian covariates. Spectral methods are a popular class of estimators which output the top two eigenvectors of a suitable data-dependent matrix. However, despite the wide applicability, their design is still obtained via heuristic considerations, and the number of samples $n$ needed to guarantee recovery is super-linear in the signal dimension $d$. In this paper, we develop exact asymptotics on spectral methods in the challenging proportional regime in which $n, d$ grow large and their ratio converges to a finite constant. This allows us optimize the design of the spectral method, and combine it with a simple linear estimator, to minimize the estimation error. Our characterization exploits a mix of tools from random matrices, free probability and the theory of approximate message passing algorithms. Numerical simulations for mixed linear regression and phase retrieval demonstrate the advantage enabled by our analysis over existing designs of spectral methods.
STAug 28, 2023
Spectral Estimators for Structured Generalized Linear Models via Approximate Message PassingYihan Zhang, Hong Chang Ji, Ramji Venkataramanan et al.
We consider the problem of parameter estimation in a high-dimensional generalized linear model. Spectral methods obtained via the principal eigenvector of a suitable data-dependent matrix provide a simple yet surprisingly effective solution. However, despite their wide use, a rigorous performance characterization, as well as a principled way to preprocess the data, are available only for unstructured (i.i.d.\ Gaussian and Haar orthogonal) designs. In contrast, real-world data matrices are highly structured and exhibit non-trivial correlations. To address the problem, we consider correlated Gaussian designs capturing the anisotropic nature of the features via a covariance matrix $Σ$. Our main result is a precise asymptotic characterization of the performance of spectral estimators. This allows us to identify the optimal preprocessing that minimizes the number of samples needed for parameter estimation. Surprisingly, such preprocessing is universal across a broad set of designs, which partly addresses a conjecture on optimal spectral estimators for rotationally invariant models. Our principled approach vastly improves upon previous heuristic methods, including for designs common in computational imaging and genetics. The proposed methodology, based on approximate message passing, is broadly applicable and opens the way to the precise characterization of spiked matrices and of the corresponding spectral methods in a variety of settings.
MLApr 5, 2023
Mixed Regression via Approximate Message PassingNelvin Tan, Ramji Venkataramanan
We study the problem of regression in a generalized linear model (GLM) with multiple signals and latent variables. This model, which we call a matrix GLM, covers many widely studied problems in statistical learning, including mixed linear regression, max-affine regression, and mixture-of-experts. In mixed linear regression, each observation comes from one of $L$ signal vectors (regressors), but we do not know which one; in max-affine regression, each observation comes from the maximum of $L$ affine functions, each defined via a different signal vector. The goal in all these problems is to estimate the signals, and possibly some of the latent variables, from the observations. We propose a novel approximate message passing (AMP) algorithm for estimation in a matrix GLM and rigorously characterize its performance in the high-dimensional limit. This characterization is in terms of a state evolution recursion, which allows us to precisely compute performance measures such as the asymptotic mean-squared error. The state evolution characterization can be used to tailor the AMP algorithm to take advantage of any structural information known about the signals. Using state evolution, we derive an optimal choice of AMP `denoising' functions that minimizes the estimation error in each iteration. The theoretical results are validated by numerical simulations for mixed linear regression, max-affine regression, and mixture-of-experts. For max-affine regression, we propose an algorithm that combines AMP with expectation-maximization to estimate intercepts of the model along with the signals. The numerical results show that AMP significantly outperforms other estimators for mixed linear regression and max-affine regression in most parameter regimes.
MLMar 3, 2023
Statistical-Computational Tradeoffs in Mixed Sparse Linear RegressionGabriel Arpino, Ramji Venkataramanan
We consider the problem of mixed sparse linear regression with two components, where two real $k$-sparse signals $β_1, β_2$ are to be recovered from $n$ unlabelled noisy linear measurements. The sparsity is allowed to be sublinear in the dimension, and additive noise is assumed to be independent Gaussian with variance $σ^2$. Prior work has shown that the problem suffers from a $\frac{k}{SNR^2}$-to-$\frac{k^2}{SNR^2}$ statistical-to-computational gap, resembling other computationally challenging high-dimensional inference problems such as Sparse PCA and Robust Sparse Mean Estimation; here $SNR$ is the signal-to-noise ratio. We establish the existence of a more extensive computational barrier for this problem through the method of low-degree polynomials, but show that the problem is computationally hard only in a very narrow symmetric parameter regime. We identify a smooth information-computation tradeoff between the sample complexity $n$ and runtime for any randomized algorithm in this hard regime. Via a simple reduction, this provides novel rigorous evidence for the existence of a computational barrier to solving exact support recovery in sparse phase retrieval with sample complexity $n = \tilde{o}(k^2)$. Our second contribution is to analyze a simple thresholding algorithm which, outside of the narrow regime where the problem is hard, solves the associated mixed regression detection problem in $O(np)$ time with square-root the number of samples and matches the sample complexity required for (non-mixed) sparse linear regression; this allows the recovery problem to be subsequently solved by state-of-the-art techniques from the dense case. As a special case of our results, we show that this simple algorithm is order-optimal among a large family of algorithms in solving exact signed support recovery in sparse linear regression.
MLFeb 18
Error Propagation and Model Collapse in Diffusion Models: A Theoretical StudyNail B. Khelifa, Richard E. Turner, Ramji Venkataramanan
Machine learning models are increasingly trained or fine-tuned on synthetic data. Recursively training on such data has been observed to significantly degrade performance in a wide range of tasks, often characterized by a progressive drift away from the target distribution. In this work, we theoretically analyze this phenomenon in the setting of score-based diffusion models. For a realistic pipeline where each training round uses a combination of synthetic data and fresh samples from the target distribution, we obtain upper and lower bounds on the accumulated divergence between the generated and target distributions. This allows us to characterize different regimes of drift, depending on the score estimation error and the proportion of fresh data used in each generation. We also provide empirical results on synthetic data and images to illustrate the theory.
STFeb 9
Optimal Estimation in Orthogonally Invariant Generalized Linear Models: Spectral Initialization and Approximate Message PassingYihan Zhang, Hong Chang Ji, Ramji Venkataramanan et al.
We consider the problem of parameter estimation from a generalized linear model with a random design matrix that is orthogonally invariant in law. Such a model allows the design have an arbitrary distribution of singular values and only assumes that its singular vectors are generic. It is a vast generalization of the i.i.d. Gaussian design typically considered in the theoretical literature, and is motivated by the fact that real data often have a complex correlation structure so that methods relying on i.i.d. assumptions can be highly suboptimal. Building on the paradigm of spectrally-initialized iterative optimization, this paper proposes optimal spectral estimators and combines them with an approximate message passing (AMP) algorithm, establishing rigorous performance guarantees for these two algorithmic steps. Both the spectral initialization and the subsequent AMP meet existing conjectures on the fundamental limits to estimation -- the former on the optimal sample complexity for efficient weak recovery, and the latter on the optimal errors. Numerical experiments suggest the effectiveness of our methods and accuracy of our theory beyond orthogonally invariant data.
MLApr 11, 2024
Inferring Change Points in High-Dimensional Regression via Approximate Message PassingGabriel Arpino, Xiaoqi Liu, Julia Gontarek et al.
We consider the problem of localizing change points in a generalized linear model (GLM), a model that covers many widely studied problems in statistical learning including linear, logistic, and rectified linear regression. We propose a novel and computationally efficient Approximate Message Passing (AMP) algorithm for estimating both the signals and the change point locations, and rigorously characterize its performance in the high-dimensional limit where the number of parameters $p$ is proportional to the number of samples $n$. This characterization is in terms of a state evolution recursion, which allows us to precisely compute performance measures such as the asymptotic Hausdorff error of our change point estimates, and allows us to tailor the algorithm to take advantage of any prior structural information on the signals and change points. Moreover, we show how our AMP iterates can be used to efficiently compute a Bayesian posterior distribution over the change point locations in the high-dimensional limit. We validate our theory via numerical experiments, and demonstrate the favorable performance of our estimators on both synthetic and real data in the settings of linear, logistic, and rectified linear regression.
MLDec 8, 2021
Estimation in Rotationally Invariant Generalized Linear Models via Approximate Message PassingRamji Venkataramanan, Kevin Kögler, Marco Mondelli
We consider the problem of signal estimation in generalized linear models defined via rotationally invariant design matrices. Since these matrices can have an arbitrary spectral distribution, this model is well suited for capturing complex correlation structures which often arise in applications. We propose a novel family of approximate message passing (AMP) algorithms for signal estimation, and rigorously characterize their performance in the high-dimensional limit via a state evolution recursion. Our rotationally invariant AMP has complexity of the same order as the existing AMP derived under the restrictive assumption of a Gaussian design; our algorithm also recovers this existing AMP as a special case. Numerical results showcase a performance close to Vector AMP (which is conjectured to be Bayes-optimal in some settings), but obtained with a much lower complexity, as the proposed algorithm does not require a computationally expensive singular value decomposition.
MLJun 4, 2021
PCA Initialization for Approximate Message Passing in Rotationally Invariant ModelsMarco Mondelli, Ramji Venkataramanan
We study the problem of estimating a rank-$1$ signal in the presence of rotationally invariant noise-a class of perturbations more general than Gaussian noise. Principal Component Analysis (PCA) provides a natural estimator, and sharp results on its performance have been obtained in the high-dimensional regime. Recently, an Approximate Message Passing (AMP) algorithm has been proposed as an alternative estimator with the potential to improve the accuracy of PCA. However, the existing analysis of AMP requires an initialization that is both correlated with the signal and independent of the noise, which is often unrealistic in practice. In this work, we combine the two methods, and propose to initialize AMP with PCA. Our main result is a rigorous asymptotic characterization of the performance of this estimator. Both the AMP algorithm and its analysis differ from those previously derived in the Gaussian setting: at every iteration, our AMP algorithm requires a specific term to account for PCA initialization, while in the Gaussian case, PCA initialization affects only the first iteration of AMP. The proof is based on a two-phase artificial AMP that first approximates the PCA estimator and then mimics the true AMP. Our numerical simulations show an excellent agreement between AMP results and theoretical predictions, and suggest an interesting open direction on achieving Bayes-optimal performance.
STMay 5, 2021
A unifying tutorial on Approximate Message PassingOliver Y. Feng, Ramji Venkataramanan, Cynthia Rush et al.
Over the last decade or so, Approximate Message Passing (AMP) algorithms have become extremely popular in various structured high-dimensional statistical problems. The fact that the origins of these techniques can be traced back to notions of belief propagation in the statistical physics literature lends a certain mystique to the area for many statisticians. Our goal in this work is to present the main ideas of AMP from a statistical perspective, to illustrate the power and flexibility of the AMP framework. Along the way, we strengthen and unify many of the results in the existing literature.
MLOct 7, 2020
Approximate Message Passing with Spectral Initialization for Generalized Linear ModelsMarco Mondelli, Ramji Venkataramanan
We consider the problem of estimating a signal from measurements obtained via a generalized linear model. We focus on estimators based on approximate message passing (AMP), a family of iterative algorithms with many appealing features: the performance of AMP in the high-dimensional limit can be succinctly characterized under suitable model assumptions; AMP can also be tailored to the empirical distribution of the signal entries, and for a wide class of estimation problems, AMP is conjectured to be optimal among all polynomial-time algorithms. However, a major issue of AMP is that in many models (such as phase retrieval), it requires an initialization correlated with the ground-truth signal and independent from the measurement matrix. Assuming that such an initialization is available is typically not realistic. In this paper, we solve this problem by proposing an AMP algorithm initialized with a spectral estimator. With such an initialization, the standard AMP analysis fails since the spectral estimator depends in a complicated way on the design matrix. Our main contribution is a rigorous characterization of the performance of AMP with spectral initialization in the high-dimensional limit. The key technical idea is to define and analyze a two-phase artificial AMP algorithm that first produces the spectral estimator, and then closely approximates the iterates of the true AMP. We also provide numerical results that demonstrate the validity of the proposed approach.
MLAug 7, 2020
Optimal Combination of Linear and Spectral Estimators for Generalized Linear ModelsMarco Mondelli, Christos Thrampoulidis, Ramji Venkataramanan
We study the problem of recovering an unknown signal $\boldsymbol x$ given measurements obtained from a generalized linear model with a Gaussian sensing matrix. Two popular solutions are based on a linear estimator $\hat{\boldsymbol x}^{\rm L}$ and a spectral estimator $\hat{\boldsymbol x}^{\rm s}$. The former is a data-dependent linear combination of the columns of the measurement matrix, and its analysis is quite simple. The latter is the principal eigenvector of a data-dependent matrix, and a recent line of work has studied its performance. In this paper, we show how to optimally combine $\hat{\boldsymbol x}^{\rm L}$ and $\hat{\boldsymbol x}^{\rm s}$. At the heart of our analysis is the exact characterization of the joint empirical distribution of $(\boldsymbol x, \hat{\boldsymbol x}^{\rm L}, \hat{\boldsymbol x}^{\rm s})$ in the high-dimensional limit. This allows us to compute the Bayes-optimal combination of $\hat{\boldsymbol x}^{\rm L}$ and $\hat{\boldsymbol x}^{\rm s}$, given the limiting distribution of the signal $\boldsymbol x$. When the distribution of the signal is Gaussian, then the Bayes-optimal combination has the form $θ\hat{\boldsymbol x}^{\rm L}+\hat{\boldsymbol x}^{\rm s}$ and we derive the optimal combination coefficient. In order to establish the limiting distribution of $(\boldsymbol x, \hat{\boldsymbol x}^{\rm L}, \hat{\boldsymbol x}^{\rm s})$, we design and analyze an Approximate Message Passing (AMP) algorithm whose iterates give $\hat{\boldsymbol x}^{\rm L}$ and approach $\hat{\boldsymbol x}^{\rm s}$. Numerical simulations demonstrate the improvement of the proposed combination with respect to the two methods considered separately.
STNov 6, 2017
Estimation of Low-Rank Matrices via Approximate Message PassingAndrea Montanari, Ramji Venkataramanan
Consider the problem of estimating a low-rank matrix when its entries are perturbed by Gaussian noise. If the empirical distribution of the entries of the spikes is known, optimal estimators that exploit this knowledge can substantially outperform simple spectral approaches. Recent work characterizes the asymptotic accuracy of Bayes-optimal estimators in the high-dimensional limit. In this paper we present a practical algorithm that can achieve Bayes-optimal accuracy above the spectral threshold. A bold conjecture from statistical physics posits that no polynomial-time algorithm achieves optimal error below the same threshold (unless the best estimator is trivial). Our approach uses Approximate Message Passing (AMP) in conjunction with a spectral initialization. AMP algorithms have proved successful in a variety of statistical estimation tasks, and are amenable to exact asymptotic analysis via state evolution. Unfortunately, state evolution is uninformative when the algorithm is initialized near an unstable fixed point, as often happens in low-rank matrix estimation. We develop a new analysis of AMP that allows for spectral initializations. Our main theorem is general and applies beyond matrix estimation. However, we use it to derive detailed predictions for the problem of estimating a rank-one matrix in noise. Special cases of this problem are closely related---via universality arguments---to the network community detection problem for two asymmetric communities. For general rank-one models, we show that AMP can be used to construct confidence intervals and control false discovery rate. We provide illustrations of the general methodology by considering the cases of sparse low-rank matrices and of block-constant low-rank matrices with symmetric blocks (we refer to the latter as to the `Gaussian Block Model').
ITJul 28, 2017
Empirical Bayes Estimators for High-Dimensional Sparse VectorsPavan Srinath, Ramji Venkataramanan
The problem of estimating a high-dimensional sparse vector $\boldsymbolθ \in \mathbb{R}^n$ from an observation in i.i.d. Gaussian noise is considered. The performance is measured using squared-error loss. An empirical Bayes shrinkage estimator, derived using a Bernoulli-Gaussian prior, is analyzed and compared with the well-known soft-thresholding estimator. We obtain concentration inequalities for the Stein's unbiased risk estimate and the loss function of both estimators. The results show that for large $n$, both the risk estimate and the loss function concentrate on deterministic values close to the true risk. Depending on the underlying $\boldsymbolθ$, either the proposed empirical Bayes (eBayes) estimator or soft-thresholding may have smaller loss. We consider a hybrid estimator that attempts to pick the better of the soft-thresholding estimator and the eBayes estimator by comparing their risk estimates. It is shown that: i) the loss of the hybrid estimator concentrates on the minimum of the losses of the two competing estimators, and ii) the risk of the hybrid estimator is within order $\frac{1}{\sqrt{n}}$ of the minimum of the two risks. Simulation results are provided to support the theoretical results. Finally, we use the eBayes and hybrid estimators as denoisers in the approximate message passing (AMP) algorithm for compressed sensing, and show that their performance is superior to the soft-thresholding denoiser in a wide range of settings.
ITJun 14, 2017
A strong converse bound for multiple hypothesis testing, with applications to high-dimensional estimationRamji Venkataramanan, Oliver Johnson
In statistical inference problems, we wish to obtain lower bounds on the minimax risk, that is to bound the performance of any possible estimator. A standard technique to obtain risk lower bounds involves the use of Fano's inequality. In an information-theoretic setting, it is known that Fano's inequality typically does not give a sharp converse result (error lower bound) for channel coding problems. Moreover, recent work has shown that an argument based on binary hypothesis testing gives tighter results. We adapt this technique to the statistical setting, and argue that Fano's inequality can always be replaced by this approach to obtain tighter lower bounds that can be easily computed and are asymptotically sharp. We illustrate our technique in three applications: density estimation, active learning of a binary classifier, and compressed sensing, obtaining tighter risk lower bounds in each case.
ITJun 6, 2016
Finite Sample Analysis of Approximate Message Passing AlgorithmsCynthia Rush, Ramji Venkataramanan
Approximate message passing (AMP) refers to a class of efficient algorithms for statistical estimation in high-dimensional problems such as compressed sensing and low-rank matrix estimation. This paper analyzes the performance of AMP in the regime where the problem dimension is large but finite. For concreteness, we consider the setting of high-dimensional regression, where the goal is to estimate a high-dimensional vector $β_0$ from a noisy measurement $y=A β_0 + w$. AMP is a low-complexity, scalable algorithm for this problem. Under suitable assumptions on the measurement matrix $A$, AMP has the attractive feature that its performance can be accurately characterized in the large system limit by a simple scalar iteration called state evolution. Previous proofs of the validity of state evolution have all been asymptotic convergence results. In this paper, we derive a concentration inequality for AMP with i.i.d. Gaussian measurement matrices with finite size $n \times N$. The result shows that the probability of deviation from the state evolution prediction falls exponentially in $n$. This provides theoretical support for empirical findings that have demonstrated excellent agreement of AMP performance with state evolution predictions for moderately large dimensions. The concentration inequality also indicates that the number of AMP iterations $t$ can grow no faster than order $\frac{\log n}{\log \log n}$ for the performance to be close to the state evolution predictions with high probability. The analysis can be extended to obtain similar non-asymptotic results for AMP in other settings such as low-rank matrix estimation.
ITFeb 1, 2016
Cluster-Seeking James-Stein EstimatorsK. Pavan Srinath, Ramji Venkataramanan
This paper considers the problem of estimating a high-dimensional vector of parameters $\boldsymbolθ \in \mathbb{R}^n$ from a noisy observation. The noise vector is i.i.d. Gaussian with known variance. For a squared-error loss function, the James-Stein (JS) estimator is known to dominate the simple maximum-likelihood (ML) estimator when the dimension $n$ exceeds two. The JS-estimator shrinks the observed vector towards the origin, and the risk reduction over the ML-estimator is greatest for $\boldsymbolθ$ that lie close to the origin. JS-estimators can be generalized to shrink the data towards any target subspace. Such estimators also dominate the ML-estimator, but the risk reduction is significant only when $\boldsymbolθ$ lies close to the subspace. This leads to the question: in the absence of prior information about $\boldsymbolθ$, how do we design estimators that give significant risk reduction over the ML-estimator for a wide range of $\boldsymbolθ$? In this paper, we propose shrinkage estimators that attempt to infer the structure of $\boldsymbolθ$ from the observed data in order to construct a good attracting subspace. In particular, the components of the observed vector are separated into clusters, and the elements in each cluster shrunk towards a common attractor. The number of clusters and the attractor for each cluster are determined from the observed vector. We provide concentration results for the squared-error loss and convergence results for the risk of the proposed estimators. The results show that the estimators give significant risk reduction over the ML-estimator for a wide range of $\boldsymbolθ$, particularly for large $n$. Simulation results are provided to support the theoretical claims.
ITJan 23, 2015
Capacity-achieving Sparse Superposition Codes via Approximate Message Passing DecodingCynthia Rush, Adam Greig, Ramji Venkataramanan
Sparse superposition codes were recently introduced by Barron and Joseph for reliable communication over the AWGN channel at rates approaching the channel capacity. The codebook is defined in terms of a Gaussian design matrix, and codewords are sparse linear combinations of columns of the matrix. In this paper, we propose an approximate message passing decoder for sparse superposition codes, whose decoding complexity scales linearly with the size of the design matrix. The performance of the decoder is rigorously analyzed and it is shown to asymptotically achieve the AWGN capacity with an appropriate power allocation. Simulation results are provided to demonstrate the performance of the decoder at finite blocklengths. We introduce a power allocation scheme to improve the empirical performance, and demonstrate how the decoding complexity can be significantly reduced by using Hadamard design matrices.
ITFeb 3, 2012
Lossy Compression via Sparse Linear Regression: Performance under Minimum-distance EncodingRamji Venkataramanan, Antony Joseph, Sekhar Tatikonda
We study a new class of codes for lossy compression with the squared-error distortion criterion, designed using the statistical framework of high-dimensional linear regression. Codewords are linear combinations of subsets of columns of a design matrix. Called a Sparse Superposition or Sparse Regression codebook, this structure is motivated by an analogous construction proposed recently by Barron and Joseph for communication over an AWGN channel. For i.i.d Gaussian sources and minimum-distance encoding, we show that such a code can attain the Shannon rate-distortion function with the optimal error exponent, for all distortions below a specified value. It is also shown that sparse regression codes are robust in the following sense: a codebook designed to compress an i.i.d Gaussian source of variance $σ^2$ with (squared-error) distortion $D$ can compress any ergodic source of variance less than $σ^2$ to within distortion $D$. Thus the sparse regression ensemble retains many of the good covering properties of the i.i.d random Gaussian ensemble, while having having a compact representation in terms of a matrix whose size is a low-order polynomial in the block-length.