LG CRFeb 9

Dashed Line Defense: Plug-And-Play Defense Against Adaptive Score-Based Query Attacks

arXiv:2602.08679v11.4h-index: 1

Originality Incremental advance

AI Analysis

This work addresses a critical security problem for deep learning systems by providing a robust defense against adaptive black-box attacks, though it is incremental as it builds upon existing plug-and-play runtime defense approaches.

The paper tackles the vulnerability of deep learning models to adaptive score-based query attacks by proposing Dashed Line Defense (DLD), a plug-and-play post-processing method that introduces ambiguity in loss observations to disrupt adversarial example generation, and demonstrates its effectiveness on ImageNet by consistently outperforming prior defenses under worst-case adaptive attacks while preserving model accuracy.

Score-based query attacks pose a serious threat to deep learning models by crafting adversarial examples (AEs) using only black-box access to model output scores, iteratively optimizing inputs based on observed loss values. While recent runtime defenses attempt to disrupt this process via output perturbation, most either require access to model parameters or fail when attackers adapt their tactics. In this paper, we first reveal that even the state-of-the-art plug-and-play defense can be bypassed by adaptive attacks, exposing a critical limitation of existing runtime defenses. We then propose Dashed Line Defense (DLD), a plug-and-play post-processing method specifically designed to withstand adaptive query strategies. By introducing ambiguity in how the observed loss reflects the true adversarial strength of candidate examples, DLD prevents attackers from reliably analyzing and adapting their queries, effectively disrupting the AE generation process. We provide theoretical guarantees of DLD's defense capability and validate its effectiveness through experiments on ImageNet, demonstrating that DLD consistently outperforms prior defenses--even under worst-case adaptive attacks--while preserving the model's predicted labels.

View on arXiv PDF

Similar