CV AI CLDec 8, 2021

SNEAK: Synonymous Sentences-Aware Adversarial Attack on Natural Language Video Localization

Wenbo Gou, Wen Shi, Jian Lou, Lijie Huang, Pan Zhou, Ruixuan Li

arXiv:2112.04154v12.61 citations

Originality Highly original

AI Analysis

This addresses security concerns for NLVL systems used in video retrieval and surveillance, though it is incremental as it builds on existing adversarial attack research.

The paper tackles the adversarial vulnerability in natural language video localization (NLVL) models by proposing SNEAK, a new attack method that exploits cross-modality interplay, achieving a 20% success rate in fooling models.

Natural language video localization (NLVL) is an important task in the vision-language understanding area, which calls for an in-depth understanding of not only computer vision and natural language side alone, but more importantly the interplay between both sides. Adversarial vulnerability has been well-recognized as a critical security issue of deep neural network models, which requires prudent investigation. Despite its extensive yet separated studies in video and language tasks, current understanding of the adversarial robustness in vision-language joint tasks like NLVL is less developed. This paper therefore aims to comprehensively investigate the adversarial robustness of NLVL models by examining three facets of vulnerabilities from both attack and defense aspects. To achieve the attack goal, we propose a new adversarial attack paradigm called synonymous sentences-aware adversarial attack on NLVL (SNEAK), which captures the cross-modality interplay between the vision and language sides.

View on arXiv PDF

Similar