CVJul 29, 2024

Adversarial Robustness in RGB-Skeleton Action Recognition: Leveraging Attention Modality Reweighter

arXiv:2407.19981v11 citationsh-index: 26
Originality Incremental advance
AI Analysis

This addresses security concerns for applications using action recognition in critical fields, but it is incremental as it builds on existing multimodal methods.

The paper tackles the problem of adversarial robustness in RGB-skeleton action recognition models by proposing the Attention-based Modality Reweighter (AMR), which re-weights modalities to learn more robust features, resulting in a 43.77% improvement against PGD20 attacks on the NTU-RGB+D 60 dataset compared to SOTA methods.

Deep neural networks (DNNs) have been applied in many computer vision tasks and achieved state-of-the-art (SOTA) performance. However, misclassification will occur when DNNs predict adversarial examples which are created by adding human-imperceptible adversarial noise to natural examples. This limits the application of DNN in security-critical fields. In order to enhance the robustness of models, previous research has primarily focused on the unimodal domain, such as image recognition and video understanding. Although multi-modal learning has achieved advanced performance in various tasks, such as action recognition, research on the robustness of RGB-skeleton action recognition models is scarce. In this paper, we systematically investigate how to improve the robustness of RGB-skeleton action recognition models. We initially conducted empirical analysis on the robustness of different modalities and observed that the skeleton modality is more robust than the RGB modality. Motivated by this observation, we propose the \formatword{A}ttention-based \formatword{M}odality \formatword{R}eweighter (\formatword{AMR}), which utilizes an attention layer to re-weight the two modalities, enabling the model to learn more robust features. Our AMR is plug-and-play, allowing easy integration with multimodal models. To demonstrate the effectiveness of AMR, we conducted extensive experiments on various datasets. For example, compared to the SOTA methods, AMR exhibits a 43.77\% improvement against PGD20 attacks on the NTU-RGB+D 60 dataset. Furthermore, it effectively balances the differences in robustness between different modalities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes