Adaptive Causal Alignment for High-Confidence Adversarial Training
For researchers in adversarial robustness, this work addresses the overlooked issue of spurious correlations in high-confidence predictions, offering a principled solution to improve robust generalization.
The paper identifies a paradox in inverse adversarial training where high confidence often stems from spurious background correlations, and proposes HICAT, a framework that adaptively debiases background context and enforces feature disentanglement, achieving consistent improvements over baselines on CIFAR-10, CIFAR-100, and ImageNet-1K while reducing the robust generalization gap.
Inverse adversarial training leverages high-confidence predictions to stabilize robust learning, yet we uncover a critical paradox: high confidence often stems from overfitting to non-causal background correlations rather than intrinsic object semantics. Our investigation reveals that visual context functions as a dual-natured signal, serving as either a necessary supportive prior or a spurious confounder. This insight renders existing blind suppression strategies flawed, as they inevitably lead to severe Feature Loss. To resolve this, we propose High-Confidence Causally Aligned Training (HICAT), a unified framework that establishes a Semantic Equilibrium. Operating on a ``Measure-Debias-Align'' pipeline, HICAT integrates a Learnable Background-Bias Estimator (LBBE) to adaptively diagnose context utility. Guided by this diagnosis, an Adaptive Debiasing mechanism performs surgical logit rectification, complemented by a geometrically grounded Foreground Logit Orthogonal Enhancement (FLOE) loss to enforce rigorous feature disentanglement. Extensive experiments on CIFAR-10, CIFAR-100, and ImageNet-1K demonstrate that HICAT consistently improves over matched baselines across diverse architectures (CNNs and ViTs) while significantly reducing the robust generalization gap.