CVNov 21, 2025

ATAC: Augmentation-Based Test-Time Adversarial Correction for CLIP

arXiv:2511.17362v21 citations
Originality Highly original
AI Analysis

This addresses the robustness issue for users of CLIP in zero-shot image-text matching, offering a novel and efficient defense against adversarial attacks.

The paper tackled the problem of CLIP's vulnerability to adversarial image perturbations by proposing ATAC, a test-time defense method that corrects embeddings using augmentation-induced drift vectors, achieving nearly 50% higher robustness on average compared to previous state-of-the-art methods with minimal computational overhead.

Despite its remarkable success in zero-shot image-text matching, CLIP remains highly vulnerable to adversarial perturbations on images. As adversarial fine-tuning is prohibitively costly, recent works explore various test-time defense strategies; however, these approaches still exhibit limited robustness. In this work, we revisit this problem and propose a simple yet effective strategy: Augmentation-based Test-time Adversarial Correction (ATAC). Our method operates directly in the embedding space of CLIP, calculating augmentation-induced drift vectors to infer a semantic recovery direction and correcting the embedding based on the angular consistency of these latent drifts. Across a wide range of benchmarks, ATAC consistently achieves remarkably high robustness, surpassing that of previous state-of-the-art methods by nearly 50\% on average, all while requiring minimal computational overhead. Furthermore, ATAC retains state-of-the-art robustness in unconventional and extreme settings and even achieves nontrivial robustness against adaptive attacks. Our results demonstrate that ATAC is an efficient method in a novel paradigm for test-time adversarial defenses in the embedding space of CLIP.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes