Navigating the Trade-off: A Synthesis of Defensive Strategies for Zero-Shot Adversarial Robustness in Vision-Language Models
This is an incremental synthesis paper that addresses the trade-off between robustness and generalization in zero-shot adversarial defenses for vision-language models, relevant to researchers in AI security and multimodal learning.
This paper synthesizes research on defending vision-language models like CLIP against adversarial attacks while preserving zero-shot generalization, analyzing eight papers that explore methods such as adversarial fine-tuning and training-free defenses. It traces the evolution from alignment-preserving techniques to embedding space re-engineering and identifies future directions like hybrid strategies.
This report synthesizes eight seminal papers on the zero-shot adversarial robustness of vision-language models (VLMs) like CLIP. A central challenge in this domain is the inherent trade-off between enhancing adversarial robustness and preserving the model's zero-shot generalization capabilities. We analyze two primary defense paradigms: Adversarial Fine-Tuning (AFT), which modifies model parameters, and Training-Free/Test-Time Defenses, which preserve them. We trace the evolution from alignment-preserving methods (TeCoA) to embedding space re-engineering (LAAT, TIMA), and from input heuristics (AOM, TTC) to latent-space purification (CLIPure). Finally, we identify key challenges and future directions including hybrid defense strategies and adversarial pre-training.