STMar 5, 2023
Universal distribution of the empirical coverage in split conformal predictionPaulo C. Marques F
When split conformal prediction operates in batch mode with exchangeable data, we determine the exact distribution of the empirical coverage of prediction sets produced for a finite batch of future observables, as well as the exact distribution of its almost sure limit when the batch size goes to infinity. Both distributions are universal, being determined solely by the nominal miscoverage level and the calibration sample size, thereby establishing a criterion for choosing the minimum required calibration sample size in applications.
MLMay 18, 2025
Stacked conformal predictionPaulo C. Marques F
We consider a method for conformalizing a stacked ensemble of predictive models, showing that the potentially simple form of the meta-learner at the top of the stack enables a procedure with manageable computational cost that achieves approximate marginal validity without requiring the use of a separate calibration sample. Empirical results indicate that the method compares favorably to a standard inductive alternative.
MLDec 11, 2021
Confidence intervals for the random forest generalization errorPaulo C. Marques F
We show that the byproducts of the standard training process of a random forest yield not only the well known and almost computationally free out-of-bag point estimate of the model generalization error, but also give a direct path to compute confidence intervals for the generalization error which avoids processes of data splitting and model retraining. Besides the low computational cost involved in their construction, these confidence intervals are shown through simulations to have good coverage and appropriate shrinking rate of their width in terms of the training sample size.