STITSPMLFeb 17, 2020

Sharp Asymptotics and Optimal Performance for Inference in Binary Models

arXiv:2002.07284v229 citations
AI Analysis

This provides theoretical guarantees for inference in binary models, which is incremental but important for statistical learning applications.

The paper tackles the problem of predicting statistical performance for convex empirical risk minimization in high-dimensional binary models, showing that the derived bound on optimal performance is tight for popular models like Logistic and Probit, with least-squares achieving at least 0.997 and 0.98 times the optimal performance, respectively.

We study convex empirical risk minimization for high-dimensional inference in binary models. Our first result sharply predicts the statistical performance of such estimators in the linear asymptotic regime under isotropic Gaussian features. Importantly, the predictions hold for a wide class of convex loss functions, which we exploit in order to prove a bound on the best achievable performance among them. Notably, we show that the proposed bound is tight for popular binary models (such as Signed, Logistic or Probit), by constructing appropriate loss functions that achieve it. More interestingly, for binary linear classification under the Logistic and Probit models, we prove that the performance of least-squares is no worse than 0.997 and 0.98 times the optimal one. Numerical simulations corroborate our theoretical findings and suggest they are accurate even for relatively small problem dimensions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes