CVApr 27, 2017

Action Understanding with Multiple Classes of Actors

arXiv:1704.08723v15 citations
Originality Synthesis-oriented
AI Analysis

It addresses the narrow focus in computer vision on single-actor types, providing a dataset and benchmark for broader action understanding, though it is incremental in extending existing methods to new data.

The paper tackles the problem of action understanding by jointly considering multiple types of actors beyond just human adults, and it shows that joint modeling of actor and action improves performance over independent modeling, with further gains from multi-scale video analysis.

Despite the rapid progress, existing works on action understanding focus strictly on one type of action agent, which we call actor---a human adult, ignoring the diversity of actions performed by other actors. To overcome this narrow viewpoint, our paper marks the first effort in the computer vision community to jointly consider algorithmic understanding of various types of actors undergoing various actions. To begin with, we collect a large annotated Actor-Action Dataset (A2D) that consists of 3782 short videos and 31 temporally untrimmed long videos. We formulate the general actor-action understanding problem and instantiate it at various granularities: video-level single- and multiple-label actor-action recognition, and pixel-level actor-action segmentation. We propose and examine a comprehensive set of graphical models that consider the various types of interplay among actors and actions. Our findings have led us to conclusive evidence that the joint modeling of actor and action improves performance over modeling each of them independently, and further improvement can be obtained by considering the multi-scale natural in video understanding. Hence, our paper concludes the argument of the value of explicit consideration of various actors in comprehensive action understanding and provides a dataset and a benchmark for later works exploring this new problem.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes