CV MLJul 21, 2017

Multi-kernel learning of deep convolutional features for action recognition

arXiv:1707.06923v21.71 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of video understanding for action recognition, which lags behind image understanding, and is incremental by integrating existing methods.

The paper tackles action recognition in videos by combining multi-kernel SVMs with a multi-stream deep convolutional neural network, achieving close to state-of-the-art performance on the challenging HMDB-51 dataset with 51 classes.

Image understanding using deep convolutional network has reached human-level performance, yet a closely related problem of video understanding especially, action recognition has not reached the requisite level of maturity. We combine multi-kernels based support-vector-machines (SVM) with a multi-stream deep convolutional neural network to achieve close to state-of-the-art performance on a 51-class activity recognition problem (HMDB-51 dataset); this specific dataset has proved to be particularly challenging for deep neural networks due to the heterogeneity in camera viewpoints, video quality, etc. The resulting architecture is named pillar networks as each (very) deep neural network acts as a pillar for the hierarchical classifiers. In addition, we illustrate that hand-crafted features such as improved dense trajectories (iDT) and Multi-skip Feature Stacking (MIFS), as additional pillars, can further supplement the performance.

View on arXiv PDF

Similar