LGCROct 1, 2025

Understanding Sensitivity of Differential Attention through the Lens of Adversarial Robustness

arXiv:2510.00517v1h-index: 3
Originality Incremental advance
AI Analysis

This work addresses the trade-off between discriminative focus and robustness in attention mechanisms for machine learning practitioners, highlighting an incremental insight into DA's limitations.

The paper tackled the problem of Differential Attention (DA) increasing adversarial vulnerability due to structural fragility, showing that DA leads to higher attack success rates and stronger local sensitivity compared to standard attention in experiments across multiple datasets and models.

Differential Attention (DA) has been proposed as a refinement to standard attention, suppressing redundant or noisy context through a subtractive structure and thereby reducing contextual hallucination. While this design sharpens task-relevant focus, we show that it also introduces a structural fragility under adversarial perturbations. Our theoretical analysis identifies negative gradient alignment-a configuration encouraged by DA's subtraction-as the key driver of sensitivity amplification, leading to increased gradient norms and elevated local Lipschitz constants. We empirically validate this Fragile Principle through systematic experiments on ViT/DiffViT and evaluations of pretrained CLIP/DiffCLIP, spanning five datasets in total. These results demonstrate higher attack success rates, frequent gradient opposition, and stronger local sensitivity compared to standard attention. Furthermore, depth-dependent experiments reveal a robustness crossover: stacking DA layers attenuates small perturbations via depth-dependent noise cancellation, though this protection fades under larger attack budgets. Overall, our findings uncover a fundamental trade-off: DA improves discriminative focus on clean inputs but increases adversarial vulnerability, underscoring the need to jointly design for selectivity and robustness in future attention mechanisms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes