CV AI CLAug 17, 2025

Adversarial Attacks on VQA-NLE: Exposing and Alleviating Inconsistencies in Visual Question Answering Explanations

Yahsin Yeh, Yilun Wu, Bokai Ruan, Honghan Shuai

arXiv:2508.12430v16.21 citationsh-index: 1

Originality Highly original

AI Analysis

This work addresses security and reliability concerns in VQA-NLE systems, which are critical for making AI models transparent, though it is incremental in proposing new attacks and defenses.

The paper tackled the problem of inconsistent explanations in visual question answering with natural language explanations (VQA-NLE) by exposing vulnerabilities through adversarial attacks on questions and images, and introduced a knowledge-based mitigation method that improved robustness, with evaluations on benchmarks showing effectiveness.

Natural language explanations in visual question answering (VQA-NLE) aim to make black-box models more transparent by elucidating their decision-making processes. However, we find that existing VQA-NLE systems can produce inconsistent explanations and reach conclusions without genuinely understanding the underlying context, exposing weaknesses in either their inference pipeline or explanation-generation mechanism. To highlight these vulnerabilities, we not only leverage an existing adversarial strategy to perturb questions but also propose a novel strategy that minimally alters images to induce contradictory or spurious outputs. We further introduce a mitigation method that leverages external knowledge to alleviate these inconsistencies, thereby bolstering model robustness. Extensive evaluations on two standard benchmarks and two widely used VQA-NLE models underscore the effectiveness of our attacks and the potential of knowledge-based defenses, ultimately revealing pressing security and reliability concerns in current VQA-NLE systems.

View on arXiv PDF

Similar