CVJul 30, 2025

On the Reliability of Vision-Language Models Under Adversarial Frequency-Domain Perturbations

Jordan Vice, Naveed Akhtar, Yansong Gao, Richard Hartley, Ajmal Mian

arXiv:2507.22398v32 citationsh-index: 32

Originality Incremental advance

AI Analysis

This reveals a security weakness in VLMs used for content moderation and automated reasoning, which is an incremental but important finding for AI safety.

The paper exposes a critical vulnerability in Vision-Language Models (VLMs) by showing that subtle frequency-domain perturbations can systematically undermine their performance on DeepFake detection and image captioning tasks, with the method generalizing across five state-of-the-art VLMs and ten datasets.

Vision-Language Models (VLMs) are increasingly used as perceptual modules for visual content reasoning, including through captioning and DeepFake detection. In this work, we expose a critical vulnerability of VLMs when exposed to subtle, structured perturbations in the frequency domain. Specifically, we highlight how these feature transformations undermine authenticity/DeepFake detection and automated image captioning tasks. We design targeted image transformations, operating in the frequency domain to systematically adjust VLM outputs when exposed to frequency-perturbed real and synthetic images. We demonstrate that the perturbation injection method generalizes across five state-of-the-art VLMs which includes different-parameter Qwen2/2.5 and BLIP models. Experimenting across ten real and generated image datasets reveals that VLM judgments are sensitive to frequency-based cues and may not wholly align with semantic content. Crucially, we show that visually-imperceptible spatial frequency transformations expose the fragility of VLMs deployed for automated image captioning and authenticity detection tasks. Our findings under realistic, black-box constraints challenge the reliability of VLMs, underscoring the need for robust multimodal perception systems.

View on arXiv PDF

Similar