Who did What at Where and When: Simultaneous Multi-Person Tracking and Activity Recognition
This addresses the challenge of integrated multi-person tracking and activity recognition for applications like surveillance or robotics, but it appears incremental as it builds on existing graphical models and benchmarks.
The paper tackles the problem of simultaneously tracking multiple people and recognizing their activities at individual, interaction, and group levels, using a bootstrapping framework with a graphical model and hypergraph formulation, achieving advantages over state-of-the-art methods on several benchmarks.
We present a bootstrapping framework to simultaneously improve multi-person tracking and activity recognition at individual, interaction and social group activity levels. The inference consists of identifying trajectories of all pedestrian actors, individual activities, pairwise interactions, and collective activities, given the observed pedestrian detections. Our method uses a graphical model to represent and solve the joint tracking and recognition problems via multi-stages: (1) activity-aware tracking, (2) joint interaction recognition and occlusion recovery, and (3) collective activity recognition. We solve the where and when problem with visual tracking, as well as the who and what problem with recognition. High-order correlations among the visible and occluded individuals, pairwise interactions, groups, and activities are then solved using a hypergraph formulation within the Bayesian framework. Experiments on several benchmarks show the advantages of our approach over state-of-art methods.