CVMay 17, 2015

Visual Semantic Role Labeling

arXiv:1505.04474v1482 citations
Originality Incremental advance
AI Analysis

This addresses the need for more complete scene understanding in computer vision beyond simple action classification or person bounding boxes, though it is incremental as it builds on existing action recognition tasks.

The paper tackles the problem of Visual Semantic Role Labeling by detecting people performing actions and localizing interacting objects in images, introducing a dataset of 16K people instances in 10K images with annotated actions and semantic roles.

In this paper we introduce the problem of Visual Semantic Role Labeling: given an image we want to detect people doing actions and localize the objects of interaction. Classical approaches to action recognition either study the task of action classification at the image or video clip level or at best produce a bounding box around the person doing the action. We believe such an output is inadequate and a complete understanding can only come when we are able to associate objects in the scene to the different semantic roles of the action. To enable progress towards this goal, we annotate a dataset of 16K people instances in 10K images with actions they are doing and associate objects in the scene with different semantic roles for each action. Finally, we provide a set of baseline algorithms for this task and analyze error modes providing directions for future work.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes