CVJan 21, 2021

Hierarchical Graph-RNNs for Action Detection of Multiple Activities

arXiv:2101.08581v12 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of multi-activity detection in videos, which is important for applications like surveillance and human-computer interaction, but is incremental as it builds on existing RNN and graph-based methods.

The paper tackles the problem of spatially localizing multiple concurrent activities per person in video frames by incorporating temporal scene context and action relations, achieving state-of-the-art results on the AVA dataset.

In this paper, we propose an approach that spatially localizes the activities in a video frame where each person can perform multiple activities at the same time. Our approach takes the temporal scene context as well as the relations of the actions of detected persons into account. While the temporal context is modeled by a temporal recurrent neural network (RNN), the relations of the actions are modeled by a graph RNN. Both networks are trained together and the proposed approach achieves state of the art results on the AVA dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes