Action recognition in still images by latent superpixel classification
This addresses the problem of action recognition for applications like image annotation and video surveillance, but it is incremental as it builds on existing structural models with a new intermediate layer.
The paper tackled action recognition in still images by proposing a method that uses superpixels and latent classes as attributes, achieving an average accuracy of 74.06% for the positive class and 88.50% for the negative class on the Stanford 40 Actions dataset.
Action recognition from still images is an important task of computer vision applications such as image annotation, robotic navigation, video surveillance and several others. Existing approaches mainly rely on either bag-of-feature representations or articulated body-part models. However, the relationship between the action and the image segments is still substantially unexplored. For this reason, in this paper we propose to approach action recognition by leveraging an intermediate layer of "superpixels" whose latent classes can act as attributes of the action. In the proposed approach, the action class is predicted by a structural model(learnt by Latent Structural SVM) based on measurements from the image superpixels and their latent classes. Experimental results over the challenging Stanford 40 Actions dataset report a significant average accuracy of 74.06% for the positive class and 88.50% for the negative class, giving evidence to the performance of the proposed approach.