MEMLApr 25, 2018

The phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regression

arXiv:1804.09753v1149 citations
Originality Highly original
AI Analysis

This provides a foundational theoretical result for statisticians and machine learning practitioners working with logistic regression in high-dimensional settings, clarifying conditions for reliable estimation.

This paper tackles the problem of determining when the maximum likelihood estimate (MLE) exists in high-dimensional logistic regression with Gaussian covariates, establishing a sharp phase transition: if the ratio of features to samples exceeds a boundary curve, the MLE does not exist with probability one, and if below, it exists with probability one.

This paper rigorously establishes that the existence of the maximum likelihood estimate (MLE) in high-dimensional logistic regression models with Gaussian covariates undergoes a sharp `phase transition'. We introduce an explicit boundary curve $h_{\text{MLE}}$, parameterized by two scalars measuring the overall magnitude of the unknown sequence of regression coefficients, with the following property: in the limit of large sample sizes $n$ and number of features $p$ proportioned in such a way that $p/n \rightarrow κ$, we show that if the problem is sufficiently high dimensional in the sense that $κ> h_{\text{MLE}}$, then the MLE does not exist with probability one. Conversely, if $κ< h_{\text{MLE}}$, the MLE asymptotically exists with probability one.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes