CVMLJul 21, 2017

Multi-kernel learning of deep convolutional features for action recognition

arXiv:1707.06923v21 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of video understanding for action recognition, which lags behind image understanding, and is incremental by integrating existing methods.

The paper tackles action recognition in videos by combining multi-kernel SVMs with a multi-stream deep convolutional neural network, achieving close to state-of-the-art performance on the challenging HMDB-51 dataset with 51 classes.

Image understanding using deep convolutional network has reached human-level performance, yet a closely related problem of video understanding especially, action recognition has not reached the requisite level of maturity. We combine multi-kernels based support-vector-machines (SVM) with a multi-stream deep convolutional neural network to achieve close to state-of-the-art performance on a 51-class activity recognition problem (HMDB-51 dataset); this specific dataset has proved to be particularly challenging for deep neural networks due to the heterogeneity in camera viewpoints, video quality, etc. The resulting architecture is named pillar networks as each (very) deep neural network acts as a pillar for the hierarchical classifiers. In addition, we illustrate that hand-crafted features such as improved dense trajectories (iDT) and Multi-skip Feature Stacking (MIFS), as additional pillars, can further supplement the performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes