CVJul 19, 2017

Discriminative convolutional Fisher vector network for action recognition

arXiv:1707.06119v14 citations
Originality Incremental advance
AI Analysis

This work addresses action recognition for video analysis, presenting an incremental improvement by enabling end-to-end discriminative training of Fisher vector methods.

The authors tackled human action recognition in videos by proposing a neural network architecture that integrates Fisher vector processing steps as trainable layers, achieving comparable or better classification performance while reducing trainable parameters by a factor of 5.

In this work we propose a novel neural network architecture for the problem of human action recognition in videos. The proposed architecture expresses the processing steps of classical Fisher vector approaches, that is dimensionality reduction by principal component analysis (PCA) projection, Gaussian mixture model (GMM) and Fisher vector descriptor extraction, as network layers. By contrast to other methods where these steps are performed consecutively and the corresponding parameters are learned in an unsupervised manner, having them defined as a single neural network allows us to refine the whole model discriminatively in an end to end fashion. Furthermore, we show that the proposed architecture can be used as a replacement for the fully connected layers in popular convolutional networks achieving a comparable classification performance, or even significantly surpassing the performance of similar architectures while reducing the total number of trainable parameters by a factor of 5. We show that our method achieves significant improvements in comparison to the classical chain.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes