Towards a Certificate of Trust: Task-Aware OOD Detection for Scientific AI
This provides a practical tool for assessing trustworthiness in critical scientific fields like weather forecasting and fluid dynamics, though it is incremental as it builds on existing OOD detection methods.
The paper tackles the problem of detecting out-of-distribution (OOD) failures in regression tasks for scientific AI, proposing a task-aware method based on joint likelihood estimation with a score-based diffusion model, and shows it strongly correlates with prediction error across datasets like PDEs, satellite imagery, and brain tumor segmentation.
Data-driven models are increasingly adopted in critical scientific fields like weather forecasting and fluid dynamics. These methods can fail on out-of-distribution (OOD) data, but detecting such failures in regression tasks is an open challenge. We propose a new OOD detection method based on estimating joint likelihoods using a score-based diffusion model. This approach considers not just the input but also the regression model's prediction, providing a task-aware reliability score. Across numerous scientific datasets, including PDE datasets, satellite imagery and brain tumor segmentation, we show that this likelihood strongly correlates with prediction error. Our work provides a foundational step towards building a verifiable 'certificate of trust', thereby offering a practical tool for assessing the trustworthiness of AI-based scientific predictions. Our code is publicly available at https://github.com/bogdanraonic3/OOD_Detection_ScientificML