CRCVAug 26, 2025

Hidden Tail: Adversarial Image Causing Stealthy Resource Consumption in Vision-Language Models

arXiv:2508.18805v11 citationsh-index: 32Has Code
Originality Highly original
AI Analysis

This addresses a security vulnerability in VLMs for real-world applications, though it is incremental as it builds on prior attacks by improving stealthiness.

The paper tackles the problem of stealthy resource consumption attacks on Vision-Language Models (VLMs) by proposing Hidden Tail, which crafts adversarial images to induce maximum-length outputs with invisible tokens, increasing output length by up to 19.2× while preserving stealthiness.

Vision-Language Models (VLMs) are increasingly deployed in real-world applications, but their high inference cost makes them vulnerable to resource consumption attacks. Prior attacks attempt to extend VLM output sequences by optimizing adversarial images, thereby increasing inference costs. However, these extended outputs often introduce irrelevant abnormal content, compromising attack stealthiness. This trade-off between effectiveness and stealthiness poses a major limitation for existing attacks. To address this challenge, we propose \textit{Hidden Tail}, a stealthy resource consumption attack that crafts prompt-agnostic adversarial images, inducing VLMs to generate maximum-length outputs by appending special tokens invisible to users. Our method employs a composite loss function that balances semantic preservation, repetitive special token induction, and suppression of the end-of-sequence (EOS) token, optimized via a dynamic weighting strategy. Extensive experiments show that \textit{Hidden Tail} outperforms existing attacks, increasing output length by up to 19.2$\times$ and reaching the maximum token limit, while preserving attack stealthiness. These results highlight the urgent need to improve the robustness of VLMs against efficiency-oriented adversarial threats. Our code is available at https://github.com/zhangrui4041/Hidden_Tail.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes