Unified People Tracking with Graph Neural Networks
This addresses the problem of tracking multiple people in videos for applications like surveillance or robotics, with incremental improvements in handling occlusions and flexibility.
The paper tackles multi-people tracking by proposing a unified, differentiable model that learns to associate detections into trajectories without pre-computed tracklets, achieving state-of-the-art performance on public benchmarks and a new dataset.
This work presents a unified, fully differentiable model for multi-people tracking that learns to associate detections into trajectories without relying on pre-computed tracklets. The model builds a dynamic spatiotemporal graph that aggregates spatial, contextual, and temporal information, enabling seamless information propagation across entire sequences. To improve occlusion handling, the graph can also encode scene-specific information. We also introduce a new large-scale dataset with 25 partially overlapping views, detailed scene reconstructions, and extensive occlusions. Experiments show the model achieves state-of-the-art performance on public benchmarks and the new dataset, with flexibility across diverse conditions. Both the dataset and approach will be publicly released to advance research in multi-people tracking.