DRIFT: Divergent Response in Filtered Transformations for Robust Adversarial Defense
This addresses the problem of adversarial attacks for deep learning practitioners by proposing a novel defense mechanism that is both effective and efficient, though it appears incremental in building on prior randomized defenses.
The paper tackles the vulnerability of deep neural networks to adversarial examples by identifying gradient consensus as a key driver of adversarial transferability and introduces DRIFT, a stochastic ensemble of learnable filters that disrupts this consensus. The result is substantial robustness gains on ImageNet, outperforming state-of-the-art defenses under various attacks with negligible runtime and memory cost.
Deep neural networks remain highly vulnerable to adversarial examples, and most defenses collapse once gradients can be reliably estimated. We identify \emph{gradient consensus} -- the tendency of randomized transformations to yield aligned gradients -- as a key driver of adversarial transferability. Attackers exploit this consensus to construct perturbations that remain effective across transformations. We introduce \textbf{DRIFT} (Divergent Response in Filtered Transformations), a stochastic ensemble of lightweight, learnable filters trained to actively disrupt gradient consensus. Unlike prior randomized defenses that rely on gradient masking, DRIFT enforces \emph{gradient dissonance} by maximizing divergence in Jacobian- and logit-space responses while preserving natural predictions. Our contributions are threefold: (i) we formalize gradient consensus and provide a theoretical analysis linking consensus to transferability; (ii) we propose a consensus-divergence training strategy combining prediction consistency, Jacobian separation, logit-space separation, and adversarial robustness; and (iii) we show that DRIFT achieves substantial robustness gains on ImageNet across CNNs and Vision Transformers, outperforming state-of-the-art preprocessing, adversarial training, and diffusion-based defenses under adaptive white-box, transfer-based, and gradient-free attacks. DRIFT delivers these improvements with negligible runtime and memory cost, establishing gradient divergence as a practical and generalizable principle for adversarial defense.