CVDec 21, 2015

Harnessing the Deep Net Object Models for Enhancing Human Action Recognition

arXiv:1512.06498v22 citations
Originality Incremental advance
AI Analysis

This work addresses action recognition for video analysis, but it is incremental as it builds on existing object detection methods.

The study tackled the problem of human action recognition by incorporating object information, especially static background objects, using pre-trained deep network object detectors and layer-wise feature encoding, achieving state-of-the-art performance on HMDB51 and UCF101 datasets.

In this study, the influence of objects is investigated in the scenario of human action recognition with large number of classes. We hypothesize that the objects the humans are interacting will have good say in determining the action being performed. Especially, if the objects are non-moving, such as objects appearing in the background, features such as spatio-temporal interest points, dense trajectories may fail to detect them. Hence we propose to detect objects using pre-trained object detectors in every frame statically. Trained Deep network models are used as object detectors. Information from different layers in conjunction with different encoding techniques is extensively studied to obtain the richest feature vectors. This technique is observed to yield state-of-the-art performance on HMDB51 and UCF101 datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes