CVAICLDec 8, 2021

SNEAK: Synonymous Sentences-Aware Adversarial Attack on Natural Language Video Localization

arXiv:2112.04154v11 citations
Originality Highly original
AI Analysis

This addresses security concerns for NLVL systems used in video retrieval and surveillance, though it is incremental as it builds on existing adversarial attack research.

The paper tackles the adversarial vulnerability in natural language video localization (NLVL) models by proposing SNEAK, a new attack method that exploits cross-modality interplay, achieving a 20% success rate in fooling models.

Natural language video localization (NLVL) is an important task in the vision-language understanding area, which calls for an in-depth understanding of not only computer vision and natural language side alone, but more importantly the interplay between both sides. Adversarial vulnerability has been well-recognized as a critical security issue of deep neural network models, which requires prudent investigation. Despite its extensive yet separated studies in video and language tasks, current understanding of the adversarial robustness in vision-language joint tasks like NLVL is less developed. This paper therefore aims to comprehensively investigate the adversarial robustness of NLVL models by examining three facets of vulnerabilities from both attack and defense aspects. To achieve the attack goal, we propose a new adversarial attack paradigm called synonymous sentences-aware adversarial attack on NLVL (SNEAK), which captures the cross-modality interplay between the vision and language sides.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes