ML LGJun 2, 2018

Sufficient Conditions for Idealised Models to Have No Adversarial Examples: a Theoretical and Empirical Study with Bayesian Neural Networks

arXiv:1806.00667v321.056 citations

Originality Highly original

AI Analysis

This addresses the problem of adversarial vulnerabilities in machine learning models, particularly for security-critical applications, by providing theoretical insights and practical defenses, though it is incremental in building on existing BNN and adversarial example research.

The paper proves that idealized models can avoid adversarial examples under certain conditions, and demonstrates that idealized Bayesian neural networks (BNNs) meet these criteria, with experiments showing near-perfect epistemic uncertainty correlates to image manifold density and adversarial images lie off-manifold.

We prove, under two sufficient conditions, that idealised models can have no adversarial examples. We discuss which idealised models satisfy our conditions, and show that idealised Bayesian neural networks (BNNs) satisfy these. We continue by studying near-idealised BNNs using HMC inference, demonstrating the theoretical ideas in practice. We experiment with HMC on synthetic data derived from MNIST for which we know the ground-truth image density, showing that near-perfect epistemic uncertainty correlates to density under image manifold, and that adversarial images lie off the manifold in our setting. This suggests why MC dropout, which can be seen as performing approximate inference, has been observed to be an effective defence against adversarial examples in practice; We highlight failure-cases of non-idealised BNNs relying on dropout, suggesting a new attack for dropout models and a new defence as well. Lastly, we demonstrate the defence on a cats-vs-dogs image classification task with a VGG13 variant.

View on arXiv PDF

Similar