LGCRDec 23, 2020

Gradient-Free Adversarial Attacks for Bayesian Neural Networks

arXiv:2012.12640v13 citations
AI Analysis

This work addresses the problem of understanding and improving the adversarial robustness of Bayesian Neural Networks for researchers and practitioners working with uncertainty-calibrated models, representing an incremental step in adversarial attack methodologies.

This paper investigates the adversarial robustness of Bayesian Neural Networks (BNNs) when approximate Bayesian inference methods are not well understood. The authors employed gradient-free optimization methods to find adversarial examples for BNNs, demonstrating a significant improvement in the rate of finding adversarial examples compared to state-of-the-art gradient-based methods on MNIST and Fashion MNIST datasets.

The existence of adversarial examples underscores the importance of understanding the robustness of machine learning models. Bayesian neural networks (BNNs), due to their calibrated uncertainty, have been shown to posses favorable adversarial robustness properties. However, when approximate Bayesian inference methods are employed, the adversarial robustness of BNNs is still not well understood. In this work, we employ gradient-free optimization methods in order to find adversarial examples for BNNs. In particular, we consider genetic algorithms, surrogate models, as well as zeroth order optimization methods and adapt them to the goal of finding adversarial examples for BNNs. In an empirical evaluation on the MNIST and Fashion MNIST datasets, we show that for various approximate Bayesian inference methods the usage of gradient-free algorithms can greatly improve the rate of finding adversarial examples compared to state-of-the-art gradient-based methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes