LG CV MLMar 20, 2020

Adversarial Robustness on In- and Out-Distribution Improves Explainability

Maximilian Augustin, Alexander Meinke, Matthias Hein

arXiv:2003.09461v219.5112 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses robustness and explainability issues in image classification models, though it appears incremental as it builds on adversarial training.

The paper tackles the problem of neural networks being non-robust to adversarial changes and having unreliable uncertainty estimates on out-distribution samples by proposing RATIO, a training procedure that achieves state-of-the-art adversarial robustness on CIFAR10 while maintaining better clean accuracy.

Neural networks have led to major improvements in image classification but suffer from being non-robust to adversarial changes, unreliable uncertainty estimates on out-distribution samples and their inscrutable black-box decisions. In this work we propose RATIO, a training procedure for Robustness via Adversarial Training on In- and Out-distribution, which leads to robust models with reliable and robust confidence estimates on the out-distribution. RATIO has similar generative properties to adversarial training so that visual counterfactuals produce class specific features. While adversarial training comes at the price of lower clean accuracy, RATIO achieves state-of-the-art $l_2$-adversarial robustness on CIFAR10 and maintains better clean accuracy.

View on arXiv PDF Code

Similar