MLLGMay 8, 2025

Conformal Prediction with Cellwise Outliers: A Detect-then-Impute Approach

arXiv:2505.04986v14 citationsh-index: 4ICML
Originality Incremental advance
AI Analysis

This addresses the issue of robust uncertainty quantification in conformal prediction for practitioners dealing with contaminated data, though it is incremental as it builds on existing detection and imputation methods.

The paper tackles the problem of constructing prediction intervals for black-box models when test features contain cellwise outliers, which break exchangeability assumptions, by introducing a detect-then-impute framework that first identifies and imputes outliers to restore exchangeability. The result includes algorithms like JDI-CP that achieve a finite sample 1-2α coverage guarantee and demonstrate robust coverage and comparable efficiency to an oracle baseline in experiments.

Conformal prediction is a powerful tool for constructing prediction intervals for black-box models, providing a finite sample coverage guarantee for exchangeable data. However, this exchangeability is compromised when some entries of the test feature are contaminated, such as in the case of cellwise outliers. To address this issue, this paper introduces a novel framework called detect-then-impute conformal prediction. This framework first employs an outlier detection procedure on the test feature and then utilizes an imputation method to fill in those cells identified as outliers. To quantify the uncertainty in the processed test feature, we adaptively apply the detection and imputation procedures to the calibration set, thereby constructing exchangeable features for the conformal prediction interval of the test label. We develop two practical algorithms, PDI-CP and JDI-CP, and provide a distribution-free coverage analysis under some commonly used detection and imputation procedures. Notably, JDI-CP achieves a finite sample $1-2α$ coverage guarantee. Numerical experiments on both synthetic and real datasets demonstrate that our proposed algorithms exhibit robust coverage properties and comparable efficiency to the oracle baseline.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes