Self-Supervised Multi-Object Tracking For Autonomous Driving From Consistency Across Timescales
This addresses the challenge of accurate object tracking in autonomous driving, particularly under low frame rates or high dynamics, with incremental improvements over existing self-supervised methods.
The paper tackles the problem of low re-identification accuracy in self-supervised multi-object tracking for autonomous driving by proposing a training objective that enforces consistency across multiple sequential frames, resulting in significant reductions in ID switches and performance on par with fully supervised methods.
Self-supervised multi-object trackers have tremendous potential as they enable learning from raw domain-specific data. However, their re-identification accuracy still falls short compared to their supervised counterparts. We hypothesize that this drawback results from formulating self-supervised objectives that are limited to single frames or frame pairs. Such formulations do not capture sufficient visual appearance variations to facilitate learning consistent re-identification features for autonomous driving when the frame rate is low or object dynamics are high. In this work, we propose a training objective that enables self-supervised learning of re-identification features from multiple sequential frames by enforcing consistent association scores across short and long timescales. We perform extensive evaluations demonstrating that re-identification features trained from longer sequences significantly reduce ID switches on standard autonomous driving datasets compared to existing self-supervised learning methods, which are limited to training on frame pairs. Using our proposed SubCo loss function, we set the new state-of-the-art among self-supervised methods and even perform on par with fully supervised learning methods.