CVCRLGSep 23, 2024

Interpretability-Guided Test-Time Adversarial Defense

arXiv:2409.15190v13 citationsh-index: 24
Originality Incremental advance
AI Analysis

This work addresses the challenge of adversarial robustness for machine learning practitioners by providing an efficient and effective defense method, though it is incremental as it builds on existing test-time defense approaches.

The paper tackles the problem of adversarial attacks on neural networks by proposing a low-cost, training-free test-time defense that uses interpretability-guided neuron importance ranking to improve robustness-accuracy tradeoffs with minimal computational overhead. It demonstrates efficacy on CIFAR10, CIFAR100, and ImageNet-1k with average gains of 2.6%, 4.9%, and 2.8% respectively, and shows improvements of 1.5% over state-of-the-art defenses under adaptive attacks.

We propose a novel and low-cost test-time adversarial defense by devising interpretability-guided neuron importance ranking methods to identify neurons important to the output classes. Our method is a training-free approach that can significantly improve the robustness-accuracy tradeoff while incurring minimal computational overhead. While being among the most efficient test-time defenses (4x faster), our method is also robust to a wide range of black-box, white-box, and adaptive attacks that break previous test-time defenses. We demonstrate the efficacy of our method for CIFAR10, CIFAR100, and ImageNet-1k on the standard RobustBench benchmark (with average gains of 2.6%, 4.9%, and 2.8% respectively). We also show improvements (average 1.5%) over the state-of-the-art test-time defenses even under strong adaptive attacks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes