MEMLFeb 2, 2019

High-dimensional semi-supervised learning: in search for optimal inference of the mean

arXiv:1902.00772v125 citations
Originality Incremental advance
AI Analysis

This addresses inference problems in high-dimensional settings for statisticians and data scientists, offering incremental improvements over existing methods.

The paper tackles high-dimensional semi-supervised learning for mean and variance inference, developing estimators that extend low-dimensional results and provide asymptotic distributions and confidence intervals with efficiency gains over sample means and variances.

We provide a high-dimensional semi-supervised inference framework focused on the mean and variance of the response. Our data are comprised of an extensive set of observations regarding the covariate vectors and a much smaller set of labeled observations where we observe both the response as well as the covariates. We allow the size of the covariates to be much larger than the sample size and impose weak conditions on a statistical form of the data. We provide new estimators of the mean and variance of the response that extend some of the recent results presented in low-dimensional models. In particular, at times we will not necessitate consistent estimation of the functional form of the data. Together with estimation of the population mean and variance, we provide their asymptotic distribution and confidence intervals where we showcase gains in efficiency compared to the sample mean and variance. Our procedure, with minor modifications, is then presented to make important contributions regarding inference about average treatment effects. We also investigate the robustness of estimation and coverage and showcase widespread applicability and generality of the proposed method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes