LGMay 19, 2025

A Few Large Shifts: Layer-Inconsistency Based Minimal Overhead Adversarial Example Detection

Sanggeon Yun, Ryozo Masukawa, Hyunwoo Oh, Nathaniel D. Bastian, Mohsen Imani

arXiv:2505.12586v54.12 citationsh-index: 8Has Code

Originality Highly original

AI Analysis

This addresses the vulnerability of DNNs to adversarial attacks for applications requiring robust AI systems, offering a practical and efficient detection method without relying on external models or adversarial data.

The paper tackles the problem of detecting adversarial examples in deep neural networks by introducing a lightweight, plug-in framework that leverages internal layer-wise inconsistencies, achieving state-of-the-art detection performance on datasets like CIFAR-10, CIFAR-100, and ImageNet with negligible computational overhead.

Deep neural networks (DNNs) are highly susceptible to adversarial examples--subtle, imperceptible perturbations that can lead to incorrect predictions. While detection-based defenses offer a practical alternative to adversarial training, many existing methods depend on external models, complex architectures, or adversarial data, limiting their efficiency and generalizability. We introduce a lightweight, plug-in detection framework that leverages internal layer-wise inconsistencies within the target model itself, requiring only benign data for calibration. Our approach is grounded in the A Few Large Shifts Assumption, which posits that adversarial perturbations induce large, localized violations of layer-wise Lipschitz continuity in a small subset of layers. Building on this, we propose two complementary strategies--Recovery Testing (RT) and Logit-layer Testing (LT)--to empirically measure these violations and expose internal disruptions caused by adversaries. Evaluated on CIFAR-10, CIFAR-100, and ImageNet under both standard and adaptive threat models, our method achieves state-of-the-art detection performance with negligible computational overhead. Furthermore, our system-level analysis provides a practical method for selecting a detection threshold with a formal lower-bound guarantee on accuracy. The code is available here: https://github.com/c0510gy/AFLS-AED.

View on arXiv PDF Code

Similar