Luben M. C. Cabezas

ML
h-index10
6papers
86citations
Novelty62%
AI Score49

6 Papers

51.8MLMay 18
Conformal Prediction via Transported Beta Laws

Thiago R. Ramos, Helton Graziadei, Luben M. C. Cabezas

Split conformal prediction provides finite-sample marginal coverage under exchangeability, but this guarantee averages over the random calibration sample. We study instead the law of the calibration-conditional coverage induced by a realized conformal threshold. In the continuous i.i.d. setting this law is exactly $Beta(k,n+1-k)$, so the usual marginal guarantee corresponds to its mean. We take this beta law as a finite-sample reference object and quantify departures from it using Wasserstein distances on $[0,1]$. The framework yields direct bounds on marginal coverage gaps and on bad-calibration probabilities, and separates different sources of non-i.i.d. behavior according to how they deform the beta reference: test-side shift acts through a transport map on the coverage scale, while calibration dependence changes the order-statistic law itself. We instantiate the framework in scale-shift, clustered, and stationary mixing settings, where the induced deformations can be characterized explicitly or through Berry-Esseen approximations. Simulations on dependent processes confirm that the first-order approximation tracks the empirical Wasserstein distance even at moderate sample sizes.

MLFeb 12, 2024
Regression Trees for Fast and Adaptive Prediction Intervals

Luben M. C. Cabezas, Mateus P. Otto, Rafael Izbicki et al.

Predictive models make mistakes. Hence, there is a need to quantify the uncertainty associated with their predictions. Conformal inference has emerged as a powerful tool to create statistically valid prediction regions around point predictions, but its naive application to regression problems yields non-adaptive regions. New conformal scores, often relying upon quantile regressors or conditional density estimators, aim to address this limitation. Although they are useful for creating prediction bands, these scores are detached from the original goal of quantifying the uncertainty around an arbitrary predictive model. This paper presents a new, model-agnostic family of methods to calibrate prediction intervals for regression problems with local coverage guarantees. Our approach is based on pursuing the coarsest partition of the feature space that approximates conditional coverage. We create this partition by training regression trees and Random Forests on conformity scores. Our proposal is versatile, as it applies to various conformity scores and prediction settings and demonstrates superior scalability and performance compared to established baselines in simulated and real-world datasets. We provide a Python package clover that implements our methods using the standard scikit-learn interface.

MLFeb 10, 2025
Epistemic Uncertainty in Conformal Scores: A Unified Approach

Luben M. C. Cabezas, Vagner S. Santos, Thiago R. Ramos et al.

Conformal prediction methods create prediction bands with distribution-free guarantees but do not explicitly capture epistemic uncertainty, which can lead to overconfident predictions in data-sparse regions. Although recent conformal scores have been developed to address this limitation, they are typically designed for specific tasks, such as regression or quantile regression. Moreover, they rely on particular modeling choices for epistemic uncertainty, restricting their applicability. We introduce $\texttt{EPICSCORE}$, a model-agnostic approach that enhances any conformal score by explicitly integrating epistemic uncertainty. Leveraging Bayesian techniques such as Gaussian Processes, Monte Carlo Dropout, or Bayesian Additive Regression Trees, $\texttt{EPICSCORE}$ adaptively expands predictive intervals in regions with limited data while maintaining compact intervals where data is abundant. As with any conformal method, it preserves finite-sample marginal coverage. Additionally, it also achieves asymptotic conditional coverage. Experiments demonstrate its good performance compared to existing methods. Designed for compatibility with any Bayesian model, but equipped with distribution-free guarantees, $\texttt{EPICSCORE}$ provides a general-purpose framework for uncertainty quantification in prediction problems.

MLNov 22, 2025
Conformal Prediction for Compositional Data

Lucas P. Amaral, Luben M. C. Cabezas, Thiago R. Ramos et al.

Dirichlet regression models are suitable for compositional data, in which the response variable represents proportions that sum to one. However, there are still no well-established methods for constructing valid prediction sets in this context, especially considering the geometry of the compositional space. In this work, we investigate conformal prediction-based strategies for constructing valid predictive regions in Dirichlet regression models. We evaluate three distinct approaches: a method based on quantile residuals, an approximate construction of highest density regions (HDR), and an adaptation of the approximate HDR using grid-based discretization over the simplex. The performance of the methods was analyzed through simulation studies under different scenarios, varying the model complexity, response dimensionality, and covariate structure. The results indicated that the HDR approximation approach exhibits good robustness in terms of coverage, while the grid discretization proved effective in reducing overcoverage and the area of the prediction region compared to the original method. The quantile method provided larger prediction regions compared to the grid method, while maintaining adequate coverage. The methodologies were also applied to two real datasets: one concerning sleep stages and another on biomass allocation in plants. In both cases, the proposed methods demonstrated practical feasibility and produced coherent interpretations within the compositional space. Finally, we discuss possible extensions of this work

MLAug 23, 2025
CP4SBI: Local Conformal Calibration of Credible Sets in Simulation-Based Inference

Luben M. C. Cabezas, Vagner S. Santos, Thiago R. Ramos et al.

Current experimental scientists have been increasingly relying on simulation-based inference (SBI) to invert complex non-linear models with intractable likelihoods. However, posterior approximations obtained with SBI are often miscalibrated, causing credible regions to undercover true parameters. We develop $\texttt{CP4SBI}$, a model-agnostic conformal calibration framework that constructs credible sets with local Bayesian coverage. Our two proposed variants, namely local calibration via regression trees and CDF-based calibration, enable finite-sample local coverage guarantees for any scoring function, including HPD, symmetric, and quantile-based regions. Experiments on widely used SBI benchmarks demonstrate that our approach improves the quality of uncertainty quantification for neural posterior estimators using both normalizing flows and score-diffusion modeling.

MENov 30, 2021
Hierarchical clustering: visualization, feature importance and model selection

Luben M. C. Cabezas, Rafael Izbicki, Rafael B. Stern

We propose methods for the analysis of hierarchical clustering that fully use the multi-resolution structure provided by a dendrogram. Specifically, we propose a loss for choosing between clustering methods, a feature importance score and a graphical tool for visualizing the segmentation of features in a dendrogram. Current approaches to these tasks lead to loss of information since they require the user to generate a single partition of the instances by cutting the dendrogram at a specified level. Our proposed methods, instead, use the full structure of the dendrogram. The key insight behind the proposed methods is to view a dendrogram as a phylogeny. This analogy permits the assignment of a feature value to each internal node of a tree through an evolutionary model. Real and simulated datasets provide evidence that our proposed framework has desirable outcomes and gives more insights than state-of-art approaches. We provide an R package that implements our methods.