CVSep 7, 2015

An end-to-end generative framework for video segmentation and recognition

Hilde Kuehne, Juergen Gall, Thomas Serre

arXiv:1509.01947v2216 citations

Originality Incremental advance

AI Analysis

This work addresses video activity recognition and parsing, which is incremental as it builds on existing generative models with a specific front-end adaptation.

The paper tackles video segmentation and recognition of human activities by combining reduced Fisher Vectors with a structured temporal model, showing that this generative approach outperforms state-of-the-art methods on larger datasets.

We describe an end-to-end generative approach for the segmentation and recognition of human activities. In this approach, a visual representation based on reduced Fisher Vectors is combined with a structured temporal model for recognition. We show that the statistical properties of Fisher Vectors make them an especially suitable front-end for generative models such as Gaussian mixtures. The system is evaluated for both the recognition of complex activities as well as their parsing into action units. Using a variety of video datasets ranging from human cooking activities to animal behaviors, our experiments demonstrate that the resulting architecture outperforms state-of-the-art approaches for larger datasets, i.e. when sufficient amount of data is available for training structured generative models.

View on arXiv PDF

Similar