CVApr 4, 2018

Layout-induced Video Representation for Recognizing Agent-in-Place Actions

arXiv:1804.01429v3
Originality Incremental advance
AI Analysis

This addresses the challenge of action recognition in surveillance for improved scene understanding, but it is incremental as it builds on existing methods with a novel representation.

The paper tackles the problem of recognizing agent-in-place actions in outdoor home surveillance by introducing a Layout-Induced Video Representation (LIVR) that encodes scene layout geometry and topology, resulting in significantly better generalization to unseen scenes as demonstrated on a new dataset.

We address the recognition of agent-in-place actions, which are associated with agents who perform them and places where they occur, in the context of outdoor home surveillance. We introduce a representation of the geometry and topology of scene layouts so that a network can generalize from the layouts observed in the training set to unseen layouts in the test set. This Layout-Induced Video Representation (LIVR) abstracts away low-level appearance variance and encodes geometric and topological relationships of places in a specific scene layout. LIVR partitions the semantic features of a video clip into different places to force the network to learn place-based feature descriptions; to predict the confidence of each action, LIVR aggregates features from the place associated with an action and its adjacent places on the scene layout. We introduce the Agent-in-Place Action dataset to show that our method allows neural network models to generalize significantly better to unseen scenes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes