LGMay 7, 2025

Conformal Prediction with Corrupted Labels: Uncertain Imputation and Robust Re-weighting

Shai Feldman, Stephen Bates, Yaniv Romano

arXiv:2505.04733v211.44 citationsh-index: 11Has Code

Originality Incremental advance

AI Analysis

This addresses the problem of reliable prediction sets in noisy or missing label scenarios for machine learning practitioners, offering incremental improvements to existing conformal prediction methods.

The paper tackles robust uncertainty quantification when training labels are corrupted, showing that privileged conformal prediction remains valid even with inaccurate weights and introducing uncertain imputation to avoid weight estimation, with theoretical guarantees and empirical validation on benchmarks.

We introduce a framework for robust uncertainty quantification in situations where labeled training data are corrupted, through noisy or missing labels. We build on conformal prediction, a statistical tool for generating prediction sets that cover the test label with a pre-specified probability. The validity of conformal prediction, however, holds under the i.i.d assumption, which does not hold in our setting due to the corruptions in the data. To account for this distribution shift, the privileged conformal prediction (PCP) method proposed leveraging privileged information (PI) -- additional features available only during training -- to re-weight the data distribution, yielding valid prediction sets under the assumption that the weights are accurate. In this work, we analyze the robustness of PCP to inaccuracies in the weights. Our analysis indicates that PCP can still yield valid uncertainty estimates even when the weights are poorly estimated. Furthermore, we introduce uncertain imputation (UI), a new conformal method that does not rely on weight estimation. Instead, we impute corrupted labels in a way that preserves their uncertainty. Our approach is supported by theoretical guarantees and validated empirically on both synthetic and real benchmarks. Finally, we show that these techniques can be integrated into a triply robust framework, ensuring statistically valid predictions as long as at least one underlying method is valid.

View on arXiv PDF Code

Similar