CVAug 12, 2024

Towards Adversarial Robustness via Debiased High-Confidence Logit Alignment

Kejia Zhang, Juanjuan Weng, Shaozi Li, Zhiming Luo

arXiv:2408.06079v25.24 citationsh-index: 16Has Code

Originality Highly original

AI Analysis

This work improves adversarial robustness for deep learning models in visual tasks, addressing a critical security concern with a novel method that mitigates feature bias.

The paper tackles the vulnerability of deep neural networks to adversarial examples by addressing biased feature activations in inverse adversarial attacks, proposing Debiased High-Confidence Adversarial Training (DHAT) that achieves state-of-the-art robustness on CIFAR and ImageNet-1K benchmarks.

Despite the remarkable progress of deep neural networks (DNNs) in various visual tasks, their vulnerability to adversarial examples raises significant security concerns. Recent adversarial training methods leverage inverse adversarial attacks to generate high-confidence examples, aiming to align adversarial distributions with high-confidence class regions. However, our investigation reveals that under inverse adversarial attacks, high-confidence outputs are influenced by biased feature activations, causing models to rely on background features that lack a causal relationship with the labels. This spurious correlation bias leads to overfitting irrelevant background features during adversarial training, thereby degrading the model's robust performance and generalization capabilities. To address this issue, we propose Debiased High-Confidence Adversarial Training (DHAT), a novel approach that aligns adversarial logits with debiased high-confidence logits and restores proper attention by enhancing foreground logit orthogonality. Extensive experiments demonstrate that DHAT achieves state-of-the-art robustness on both CIFAR and ImageNet-1K benchmarks, while significantly improving generalization by mitigating the feature bias inherent in inverse adversarial training approaches. Code is available at https://github.com/KejiaZhang-Robust/DHAT.

View on arXiv PDF Code

Similar