MLLGJul 10, 2024

Split Conformal Prediction under Data Contamination

arXiv:2407.07700v310 citationsh-index: 35
Originality Incremental advance
AI Analysis

This addresses robustness issues in conformal prediction for practitioners dealing with contaminated data, but it is incremental as it builds on existing split conformal methods.

The paper tackles the robustness of split conformal prediction under data contamination, showing that a small fraction of corrupted calibration data affects coverage and efficiency on clean test points, and proposes an adjustment called Contamination Robust Conformal Prediction validated with synthetic and real datasets.

Conformal prediction is a non-parametric technique for constructing prediction intervals or sets from arbitrary predictive models under the assumption that the data is exchangeable. It is popular as it comes with theoretical guarantees on the marginal coverage of the prediction sets and the split conformal prediction variant has a very low computational cost compared to model training. We study the robustness of split conformal prediction in a data contamination setting, where we assume a small fraction of the calibration scores are drawn from a different distribution than the bulk. We quantify the impact of the corrupted data on the coverage and efficiency of the constructed sets when evaluated on "clean" test points, and verify our results with numerical experiments. Moreover, we propose an adjustment in the classification setting which we call Contamination Robust Conformal Prediction, and verify the efficacy of our approach using both synthetic and real datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes