Matteo Gasparin

h-index45
2papers

2 Papers

MLMar 22, 2024
Conformal online model aggregation

Matteo Gasparin, Aaditya Ramdas

Conformal prediction equips machine learning models with a reasonable notion of uncertainty quantification without making strong distributional assumptions. It wraps around any prediction model and converts point predictions into set predictions with a predefined marginal coverage guarantee. However, conformal prediction only works if we fix the underlying machine learning model in advance. A relatively unaddressed issue in conformal prediction is that of model selection and/or aggregation: given a set of prediction models, which one should we conformalize? This paper suggests that instead of performing model selection, it can be prudent and practical to perform conformal set aggregation in an online, adaptive fashion. We propose a wrapper that takes in several conformal prediction sets (themselves wrapped around black-box prediction models), and outputs a single adaptively-combined prediction set. Our method, called conformal online model aggregation (COMA), is based on combining the prediction sets from several algorithms by weighted voting, and can be thought of as a sort of online stacking of the underlying conformal sets. As long as the input sets have (distribution-free) coverage guarantees, COMA retains coverage guarantees, under a negative correlation assumption between errors and weights. We verify that the assumption holds empirically in all settings considered. COMA is well-suited for decentralized or distributed settings, where different users may have different models, and are only willing to share their prediction sets for a new test point in a black-box fashion. As we demonstrate, it is also well-suited to settings with distribution drift and shift, where model selection can be imprudent.

MLMar 3, 2025
Improving the statistical efficiency of cross-conformal prediction

Matteo Gasparin, Aaditya Ramdas

Vovk (2015) introduced cross-conformal prediction, a modification of split conformal designed to improve the width of prediction sets. The method, when trained with a miscoverage rate equal to $α$ and $n \gg K$, ensures a marginal coverage of at least $1 - 2α- 2(1-α)(K-1)/(n+K)$, where $n$ is the number of observations and $K$ denotes the number of folds. A simple modification of the method achieves coverage of at least $1-2α$. In this work, we propose new variants of both methods that yield smaller prediction sets without compromising the latter theoretical guarantees. The proposed methods are based on recent results deriving more statistically efficient combination of p-values that leverage exchangeability and randomization. Simulations confirm the theoretical findings and bring out some important tradeoffs.