A Large-Scale Study of Probabilistic Calibration in Neural Network Regression
This work addresses the need for accurate probabilistic predictions in regression tasks, which is crucial for optimal decision-making in various applications, though it is incremental as it builds on existing calibration research by extending it to regression.
The paper tackles the problem of probabilistic calibration in neural network regression, conducting a large-scale empirical study to assess calibration and evaluate methods like recalibration, conformal prediction, and regularization, finding that regularization offers a good tradeoff between calibration and sharpness, and post-hoc methods show superior calibration due to finite-sample coverage guarantees.
Accurate probabilistic predictions are essential for optimal decision making. While neural network miscalibration has been studied primarily in classification, we investigate this in the less-explored domain of regression. We conduct the largest empirical study to date to assess the probabilistic calibration of neural networks. We also analyze the performance of recalibration, conformal, and regularization methods to enhance probabilistic calibration. Additionally, we introduce novel differentiable recalibration and regularization methods, uncovering new insights into their effectiveness. Our findings reveal that regularization methods offer a favorable tradeoff between calibration and sharpness. Post-hoc methods exhibit superior probabilistic calibration, which we attribute to the finite-sample coverage guarantee of conformal prediction. Furthermore, we demonstrate that quantile recalibration can be considered as a specific case of conformal prediction. Our study is fully reproducible and implemented in a common code base for fair comparisons.