LG CR CVDec 5, 2022

Bayesian Learning with Information Gain Provably Bounds Risk for a Robust Adversarial Defense

Bao Gia Doan, Ehsan Abbasnejad, Javen Qinfeng Shi, Damith C. Ranasinghe

arXiv:2212.02003v26.98 citationsh-index: 47Has Code

Originality Incremental advance

AI Analysis

This work addresses robustness in deep learning against adversarial attacks, offering an incremental improvement for adversarial defense methods.

The paper tackles the problem of mode collapse in adversarially trained Bayesian Neural Networks, which limits robustness, by proposing a method to prevent mode collapse and using an information gain objective to align learning from benign and adversarial inputs, resulting in up to 20% improved robustness on CIFAR-10 and STL-10 datasets under PGD attacks.

We present a new algorithm to learn a deep neural network model robust against adversarial attacks. Previous algorithms demonstrate an adversarially trained Bayesian Neural Network (BNN) provides improved robustness. We recognize the adversarial learning approach for approximating the multi-modal posterior distribution of a Bayesian model can lead to mode collapse; consequently, the model's achievements in robustness and performance are sub-optimal. Instead, we first propose preventing mode collapse to better approximate the multi-modal posterior distribution. Second, based on the intuition that a robust model should ignore perturbations and only consider the informative content of the input, we conceptualize and formulate an information gain objective to measure and force the information learned from both benign and adversarial training instances to be similar. Importantly. we prove and demonstrate that minimizing the information gain objective allows the adversarial risk to approach the conventional empirical risk. We believe our efforts provide a step toward a basis for a principled method of adversarially training BNNs. Our model demonstrate significantly improved robustness--up to 20%--compared with adversarial training and Adv-BNN under PGD attacks with 0.035 distortion on both CIFAR-10 and STL-10 datasets.

View on arXiv PDF Code

Similar