Just One Moment: Structural Vulnerability of Deep Action Recognition against One Frame Attack
This work highlights a serious adversarial vulnerability for state-of-the-art action recognition models, which is critical for security and reliability in real-world applications.
This paper investigates the vulnerability of deep action recognition models to a 'one frame attack,' where a subtle perturbation is applied to a single frame of a video. The study reveals that these models are highly susceptible to such attacks, achieving high fooling rates with inconspicuous perturbations.
The video-based action recognition task has been extensively studied in recent years. In this paper, we study the structural vulnerability of deep learning-based action recognition models against the adversarial attack using the one frame attack that adds an inconspicuous perturbation to only a single frame of a given video clip. Our analysis shows that the models are highly vulnerable against the one frame attack due to their structural properties. Experiments demonstrate high fooling rates and inconspicuous characteristics of the attack. Furthermore, we show that strong universal one frame perturbations can be obtained under various scenarios. Our work raises the serious issue of adversarial vulnerability of the state-of-the-art action recognition models in various perspectives.