Towards Instance-Wise Calibration: Local Amortized Diagnostics and Reshaping of Conditional Densities (LADaR)
This addresses the challenge of ensuring calibrated predictive distributions for key science applications such as cosmology and weather forecasting, representing a novel method for a known bottleneck.
The paper tackles the problem of assessing and achieving instance-wise calibration for predictive distributions in complex tasks like galaxy distance estimation, introducing the LADaR framework with the Cal-PIT algorithm that learns interpretable local diagnostics and adjusts conditional density estimates. It demonstrates Cal-PIT outperforms 11 other methods in a benchmark for galaxy distance estimation, achieving better instance-wise calibration.
Key science questions, such as galaxy distance estimation and weather forecasting, often require knowing the full predictive distribution of a target variable $y$ given complex inputs $\mathbf{x}$. Despite recent advances in machine learning and physics-based models, it remains challenging to assess whether an initial model is calibrated for all $\mathbf{x}$, and when needed, to reshape the densities of $y$ toward "instance-wise" calibration. This paper introduces the LADaR (Local Amortized Diagnostics and Reshaping of Conditional Densities) framework and proposes a new computationally efficient algorithm ($\texttt{Cal-PIT}$) that produces interpretable local diagnostics and provides a mechanism for adjusting conditional density estimates (CDEs). $\texttt{Cal-PIT}$ learns a single interpretable local probability--probability map from calibration data that identifies where and how the initial model is miscalibrated across feature space, which can be used to morph CDEs such that they are well-calibrated. We illustrate the LADaR framework on synthetic examples, including probabilistic forecasting from image sequences, akin to predicting storm wind speed from satellite imagery. Our main science application involves estimating the probability density functions of galaxy distances given photometric data, where $\texttt{Cal-PIT}$ achieves better instance-wise calibration than all 11 other literature methods in a benchmark data challenge, demonstrating its utility for next-generation cosmological analyses.