Conformal Prediction for Multimodal Regression
This work addresses the need for reliable uncertainty estimation in domains with multimodal data, such as AI applications, though it is incremental as it builds on existing conformal prediction frameworks.
The paper tackled the problem of extending conformal prediction to multimodal data by introducing a method that uses internal neural network features from images and text to construct prediction intervals, enabling guaranteed uncertainty quantification for a broader range of problems.
This paper introduces multimodal conformal regression. Traditionally confined to scenarios with solely numerical input features, conformal prediction is now extended to multimodal contexts through our methodology, which harnesses internal features from complex neural network architectures processing images and unstructured text. Our findings highlight the potential for internal neural network features, extracted from convergence points where multimodal information is combined, to be used by conformal prediction to construct prediction intervals (PIs). This capability paves new paths for deploying conformal prediction in domains abundant with multimodal data, enabling a broader range of problems to benefit from guaranteed distribution-free uncertainty quantification.