CVMar 22, 2020

Ensembles of Deep Neural Networks for Action Recognition in Still Images

Sina Mohammadi, Sina Ghofrani Majelan, Shahriar B. Shokouhi

arXiv:2003.09893v120 citations

AI Analysis

This work addresses action recognition in images, a challenging domain due to lack of motion and large datasets, but it is incremental as it builds on existing techniques like transfer learning and ensembles.

The paper tackled the problem of human action recognition in still images by using transfer learning with pre-trained CNNs, an attention mechanism, and ensemble learning, achieving 93.17% accuracy on the Stanford 40 dataset.

Despite the fact that notable improvements have been made recently in the field of feature extraction and classification, human action recognition is still challenging, especially in images, in which, unlike videos, there is no motion. Thus, the methods proposed for recognizing human actions in videos cannot be applied to still images. A big challenge in action recognition in still images is the lack of large enough datasets, which is problematic for training deep Convolutional Neural Networks (CNNs) due to the overfitting issue. In this paper, by taking advantage of pre-trained CNNs, we employ the transfer learning technique to tackle the lack of massive labeled action recognition datasets. Furthermore, since the last layer of the CNN has class-specific information, we apply an attention mechanism on the output feature maps of the CNN to extract more discriminative and powerful features for classification of human actions. Moreover, we use eight different pre-trained CNNs in our framework and investigate their performance on Stanford 40 dataset. Finally, we propose using the Ensemble Learning technique to enhance the overall accuracy of action classification by combining the predictions of multiple models. The best setting of our method is able to achieve 93.17$\%$ accuracy on the Stanford 40 dataset.

View on arXiv PDF

Similar