Decoupling of neural network calibration measures
This work addresses calibration inconsistencies in neural networks for safety-critical autonomous driving systems, but it is incremental as it builds on known issues with existing metrics.
The paper investigates the coupling of neural network calibration measures, revealing inconsistencies in determining optimal calibration using metrics like ECE, AUSE, UCS, and UCE, which prevent unique model calibration for safety-critical applications such as autonomous driving. It proposes AUSE as an indirect measure for residual uncertainty, driven by aleatoric and epistemic contributions.
A lot of effort is currently invested in safeguarding autonomous driving systems, which heavily rely on deep neural networks for computer vision. We investigate the coupling of different neural network calibration measures with a special focus on the Area Under the Sparsification Error curve (AUSE) metric. We elaborate on the well-known inconsistency in determining optimal calibration using the Expected Calibration Error (ECE) and we demonstrate similar issues for the AUSE, the Uncertainty Calibration Score (UCS), as well as the Uncertainty Calibration Error (UCE). We conclude that the current methodologies leave a degree of freedom, which prevents a unique model calibration for the homologation of safety-critical functionalities. Furthermore, we propose the AUSE as an indirect measure for the residual uncertainty, which is irreducible for a fixed network architecture and is driven by the stochasticity in the underlying data generation process (aleatoric contribution) as well as the limitation in the hypothesis space (epistemic contribution).