CVApr 4, 2018

Layout-induced Video Representation for Recognizing Agent-in-Place Actions

Ruichi Yu, Hongcheng Wang, Ang Li, Jingxiao Zheng, Vlad I. Morariu, Larry S. Davis

arXiv:1804.01429v30.9

Originality Incremental advance

AI Analysis

This addresses the challenge of action recognition in surveillance for improved scene understanding, but it is incremental as it builds on existing methods with a novel representation.

The paper tackles the problem of recognizing agent-in-place actions in outdoor home surveillance by introducing a Layout-Induced Video Representation (LIVR) that encodes scene layout geometry and topology, resulting in significantly better generalization to unseen scenes as demonstrated on a new dataset.

We address the recognition of agent-in-place actions, which are associated with agents who perform them and places where they occur, in the context of outdoor home surveillance. We introduce a representation of the geometry and topology of scene layouts so that a network can generalize from the layouts observed in the training set to unseen layouts in the test set. This Layout-Induced Video Representation (LIVR) abstracts away low-level appearance variance and encodes geometric and topological relationships of places in a specific scene layout. LIVR partitions the semantic features of a video clip into different places to force the network to learn place-based feature descriptions; to predict the confidence of each action, LIVR aggregates features from the place associated with an action and its adjacent places on the scene layout. We introduce the Agent-in-Place Action dataset to show that our method allows neural network models to generalize significantly better to unseen scenes.

View on arXiv PDF

Similar