CVDec 15, 2021

Temporal Shuffling for Defending Deep Action Recognition Models against Adversarial Attacks

arXiv:2112.07921v29 citations
Originality Incremental advance
AI Analysis

This addresses the security problem for video action recognition systems by providing an incremental defense method that enhances robustness against adversarial attacks.

The paper tackles the vulnerability of deep action recognition models to adversarial attacks by proposing a defense method based on temporal shuffling of input videos, which exploits the models' robustness to frame order randomization and the sensitivity of adversarial perturbations to temporal destruction, achieving defense without additional training.

Recently, video-based action recognition methods using convolutional neural networks (CNNs) achieve remarkable recognition performance. However, there is still lack of understanding about the generalization mechanism of action recognition models. In this paper, we suggest that action recognition models rely on the motion information less than expected, and thus they are robust to randomization of frame orders. Furthermore, we find that motion monotonicity remaining after randomization also contributes to such robustness. Based on this observation, we develop a novel defense method using temporal shuffling of input videos against adversarial attacks for action recognition models. Another observation enabling our defense method is that adversarial perturbations on videos are sensitive to temporal destruction. To the best of our knowledge, this is the first attempt to design a defense method without additional training for 3D CNN-based video action recognition models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes