Statistical Guarantees for the Robustness of Bayesian Neural Networks
This work addresses the need for reliable robustness assessment in BNNs for applications like image classification, though it is incremental as it builds on existing statistical verification techniques.
The authors tackled the problem of quantifying the robustness of Bayesian Neural Networks (BNNs) against adversarial examples by introducing a probabilistic robustness measure, and they developed a framework to estimate this measure with statistical guarantees, achieving results that enable uncertainty quantification in adversarial settings on datasets like MNIST and GTSRB.
We introduce a probabilistic robustness measure for Bayesian Neural Networks (BNNs), defined as the probability that, given a test point, there exists a point within a bounded set such that the BNN prediction differs between the two. Such a measure can be used, for instance, to quantify the probability of the existence of adversarial examples. Building on statistical verification techniques for probabilistic models, we develop a framework that allows us to estimate probabilistic robustness for a BNN with statistical guarantees, i.e., with a priori error and confidence bounds. We provide experimental comparison for several approximate BNN inference techniques on image classification tasks associated to MNIST and a two-class subset of the GTSRB dataset. Our results enable quantification of uncertainty of BNN predictions in adversarial settings.