ML LGAug 13, 2025

Prediction-Powered Inference with Inverse Probability Weighting

arXiv:2508.10149v13 citationsh-index: 15

Originality Incremental advance

AI Analysis

This work addresses the challenge of handling variable labeling probabilities in statistical inference for researchers using machine learning models on partially labeled datasets, representing an incremental improvement by integrating survey sampling techniques.

The paper tackles the problem of valid statistical inference with partially labeled data under informative labeling by extending the prediction-powered inference framework with inverse probability weighting, showing in simulations that the method retains nominal coverage and variance reduction benefits.

Prediction-powered inference (PPI) is a recent framework for valid statistical inference with partially labeled data, combining model-based predictions on a large unlabeled set with bias correction from a smaller labeled subset. We show that PPI can be extended to handle informative labeling by replacing its unweighted bias-correction term with an inverse probability weighted (IPW) version, using the classical Horvitz--Thompson or Hájek forms. This connection unites design-based survey sampling ideas with modern prediction-assisted inference, yielding estimators that remain valid when labeling probabilities vary across units. We consider the common setting where the inclusion probabilities are not known but estimated from a correctly specified model. In simulations, the performance of IPW-adjusted PPI with estimated propensities closely matches the known-probability case, retaining both nominal coverage and the variance-reduction benefits of PPI.

View on arXiv PDF

Similar