CVDec 26, 2018

A Multi-Stream Convolutional Neural Network Framework for Group Activity Recognition

arXiv:1812.10328v123 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of accurately recognizing group activities in videos for applications like surveillance or sports analysis, representing an incremental improvement over existing methods.

The paper tackles group activity recognition by proposing a multi-stream CNN framework that fuses predictions from different modalities, including a new pose-based modality, achieving state-of-the-art results with 90.50% accuracy on the Volleyball dataset and 87.01% on the Collective Activity dataset.

In this work, we present a framework based on multi-stream convolutional neural networks (CNNs) for group activity recognition. Streams of CNNs are separately trained on different modalities and their predictions are fused at the end. Each stream has two branches to predict the group activity based on person and scene level representations. A new modality based on the human pose estimation is presented to add extra information to the model. We evaluate our method on the Volleyball and Collective Activity datasets. Experimental results show that the proposed framework is able to achieve state-of-the-art results when multiple or single frames are given as input to the model with 90.50% and 86.61% accuracy on Volleyball dataset, respectively, and 87.01% accuracy of multiple frames group activity on Collective Activity dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes