CVJun 5, 2022

3D Convolutional with Attention for Action Recognition

Labina Shrestha, Shikha Dubey, Farrukh Olimov, Muhammad Aasim Rafique, Moongu Jeon

arXiv:2206.02203v11.41 citationsh-index: 42

Originality Incremental advance

AI Analysis

This work addresses the challenge of high computational cost in action recognition for computer vision applications, but it is incremental as it builds on existing methods with a simpler design.

The paper tackles the problem of computationally expensive models for human action recognition by proposing a simpler deep neural network architecture combining 3D convolutional layers, fully connected layers, and an attention mechanism, achieving competitive performance on the UCF-101 dataset.

Human action recognition is one of the challenging tasks in computer vision. The current action recognition methods use computationally expensive models for learning spatio-temporal dependencies of the action. Models utilizing RGB channels and optical flow separately, models using a two-stream fusion technique, and models consisting of both convolutional neural network (CNN) and long-short term memory (LSTM) network are few examples of such complex models. Moreover, fine-tuning such complex models is computationally expensive as well. This paper proposes a deep neural network architecture for learning such dependencies consisting of a 3D convolutional layer, fully connected (FC) layers, and attention layer, which is simpler to implement and gives a competitive performance on the UCF-101 dataset. The proposed method first learns spatial and temporal features of actions through 3D-CNN, and then the attention mechanism helps the model to locate attention to essential features for recognition.

View on arXiv PDF

Similar