CVDec 4, 2018

Spatio-Temporal Action Graph Networks

arXiv:1812.01233v241 citations
AI Analysis

This work addresses activity recognition for scenarios like driving where critical events are rare and object interactions are key, offering a more efficient learning approach compared to global descriptors.

The paper tackles the problem of recognizing events involving object interactions in scenes with limited labeled examples by proposing a novel inter-object graph representation with disentangled spatial and temporal embeddings. The model demonstrates significantly improved performance on the Charades benchmark and a new driving dataset with near-collision events.

Events defined by the interaction of objects in a scene are often of critical importance; yet important events may have insufficient labeled examples to train a conventional deep model to generalize to future object appearance. Activity recognition models that represent object interactions explicitly have the potential to learn in a more efficient manner than those that represent scenes with global descriptors. We propose a novel inter-object graph representation for activity recognition based on a disentangled graph embedding with direct observation of edge appearance. We employ a novel factored embedding of the graph structure, disentangling a representation hierarchy formed over spatial dimensions from that found over temporal variation. We demonstrate the effectiveness of our model on the Charades activity recognition benchmark, as well as a new dataset of driving activities focusing on multi-object interactions with near-collision events. Our model offers significantly improved performance compared to baseline approaches without object-graph representations, or with previous graph-based models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes