Dmitry Panchenko

PRJul 14, 2021

Performance of Bayesian linear regression in a model with mismatch

Jean Barbier, Wei-Kuo Chen, Dmitry Panchenko et al.

In this paper we analyze, for a model of linear regression with gaussian covariates, the performance of a Bayesian estimator given by the mean of a log-concave posterior distribution with gaussian prior, in the high-dimensional limit where the number of samples and the covariates' dimension are large and proportional. Although the high-dimensional analysis of Bayesian estimators has been previously studied for Bayesian-optimal linear regression where the correct posterior is used for inference, much less is known when there is a mismatch. Here we consider a model in which the responses are corrupted by gaussian noise and are known to be generated as linear combinations of the covariates, but the distributions of the ground-truth regression coefficients and of the noise are unknown. This regression task can be rephrased as a statistical mechanics model known as the Gardner spin glass, an analogy which we exploit. Using a leave-one-out approach we characterize the mean-square error for the regression coefficients. We also derive the log-normalizing constant of the posterior. Similar models have been studied by Shcherbina and Tirozzi and by Talagrand, but our arguments are much more straightforward. An interesting consequence of our analysis is that in the quadratic loss case, the performance of the Bayesian estimator is independent of a global "temperature" hyperparameter and matches the ridge estimator: sampling and optimizing are equally good.

PRSep 27, 2020

Strong replica symmetry for high-dimensional disordered log-concave Gibbs measures

Jean Barbier, Dmitry Panchenko, Manuel Sáenz

We consider a generic class of log-concave, possibly random, (Gibbs) measures. We prove the concentration of an infinite family of order parameters called multioverlaps. Because they completely parametrise the quenched Gibbs measure of the system, this implies a simple representation of the asymptotic Gibbs measures, as well as the decoupling of the variables in a strong sense. These results may prove themselves useful in several contexts. In particular in machine learning and high-dimensional inference, log-concave measures appear in convex empirical risk minimisation, maximum a-posteriori inference or M-estimation. We believe that they may be applicable in establishing some type of "replica symmetric formulas" for the free energy, inference or generalisation error in such settings.

Dmitry Panchenko

2 Papers