Hierarchical Deep Temporal Models for Group Activity Recognition
This work addresses the problem of accurately classifying group activities in videos for applications like surveillance or sports analysis, representing an incremental improvement with a novel hybrid method.
The paper tackles group activity recognition in videos by modeling temporal dynamics at both individual and group levels using a two-stage LSTM-based deep model, achieving efficacy demonstrated on a new Volleyball Dataset and the standard Collective Activity Dataset.
In this paper we present an approach for classifying the activity performed by a group of people in a video sequence. This problem of group activity recognition can be addressed by examining individual person actions and their relations. Temporal dynamics exist both at the level of individual person actions as well as at the level of group activity. Given a video sequence as input, methods can be developed to capture these dynamics at both person-level and group-level detail. We build a deep model to capture these dynamics based on LSTM (long short-term memory) models. In order to model both person-level and group-level dynamics, we present a 2-stage deep temporal model for the group activity recognition problem. In our approach, one LSTM model is designed to represent action dynamics of individual people in a video sequence and another LSTM model is designed to aggregate person-level information for group activity recognition. We collected a new dataset consisting of volleyball videos labeled with individual and group activities in order to evaluate our method. Experimental results on this new Volleyball Dataset and the standard benchmark Collective Activity Dataset demonstrate the efficacy of the proposed models.