CV LGJul 1, 2021

Action Transformer: A Self-Attention Model for Short-Time Pose-Based Human Action Recognition

Vittorio Mazzia, Simone Angarano, Francesco Salvetti, Federico Angelini, Marcello Chiaberge

arXiv:2107.00606v618.7176 citationsh-index: 28Has Code

Originality Incremental advance

AI Analysis

This work addresses real-time action recognition for applications like surveillance or human-computer interaction, but it is incremental as it builds on existing pose-based methods and attention mechanisms.

The paper tackles short-time human action recognition by introducing Action Transformer (AcT), a fully self-attentional model that outperforms mixed architectures and achieves real-time performance using 2D pose representations. It also releases MPOSE2021, a large-scale dataset for benchmarking, with extensive testing showing the model's effectiveness.

Deep neural networks based purely on attention have been successful across several domains, relying on minimal architectural priors from the designer. In Human Action Recognition (HAR), attention mechanisms have been primarily adopted on top of standard convolutional or recurrent layers, improving the overall generalization capability. In this work, we introduce Action Transformer (AcT), a simple, fully self-attentional architecture that consistently outperforms more elaborated networks that mix convolutional, recurrent and attentive layers. In order to limit computational and energy requests, building on previous human action recognition research, the proposed approach exploits 2D pose representations over small temporal windows, providing a low latency solution for accurate and effective real-time performance. Moreover, we open-source MPOSE2021, a new large-scale dataset, as an attempt to build a formal training and evaluation benchmark for real-time, short-time HAR. The proposed methodology was extensively tested on MPOSE2021 and compared to several state-of-the-art architectures, proving the effectiveness of the AcT model and laying the foundations for future work on HAR.

View on arXiv PDF Code

Similar