LGMLMay 22, 2024

Towards Certification of Uncertainty Calibration under Adversarial Attacks

arXiv:2405.13922v32 citationsh-index: 12ICLR
Originality Incremental advance
AI Analysis

This work addresses the need for reliable uncertainty calibration in safety-critical applications, offering incremental improvements through certification and training methods.

The paper tackles the problem of neural classifiers being vulnerable to adversarial attacks that degrade their calibration, proposing certified calibration as worst-case bounds on calibration under such perturbations and introducing adversarial calibration training to improve model calibration.

Since neural classifiers are known to be sensitive to adversarial perturbations that alter their accuracy, \textit{certification methods} have been developed to provide provable guarantees on the insensitivity of their predictions to such perturbations. Furthermore, in safety-critical applications, the frequentist interpretation of the confidence of a classifier (also known as model calibration) can be of utmost importance. This property can be measured via the Brier score or the expected calibration error. We show that attacks can significantly harm calibration, and thus propose certified calibration as worst-case bounds on calibration under adversarial perturbations. Specifically, we produce analytic bounds for the Brier score and approximate bounds via the solution of a mixed-integer program on the expected calibration error. Finally, we propose novel calibration attacks and demonstrate how they can improve model calibration through \textit{adversarial calibration training}.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes