CVDec 20, 2017

Human Action Recognition: Pose-based Attention draws focus to Hands

Fabien Baradel, Christian Wolf, Julien Mille

arXiv:1712.08002v115.4121 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of improving accuracy and explainability in human action recognition for computer vision applications, representing an incremental advancement with a novel attention approach.

The authors tackled human action recognition by proposing a pose-based spatio-temporal attention mechanism that automatically focuses on the most involved hands and discriminative moments, achieving state-of-the-art results on the NTU-RGB+D dataset.

We propose a new spatio-temporal attention based mechanism for human action recognition able to automatically attend to the hands most involved into the studied action and detect the most discriminative moments in an action. Attention is handled in a recurrent manner employing Recurrent Neural Network (RNN) and is fully-differentiable. In contrast to standard soft-attention based mechanisms, our approach does not use the hidden RNN state as input to the attention model. Instead, attention distributions are extracted using external information: human articulated pose. We performed an extensive ablation study to show the strengths of this approach and we particularly studied the conditioning aspect of the attention mechanism. We evaluate the method on the largest currently available human action recognition dataset, NTU-RGB+D, and report state-of-the-art results. Other advantages of our model are certain aspects of explanability, as the spatial and temporal attention distributions at test time allow to study and verify on which parts of the input data the method focuses.

View on arXiv PDF

Similar