IA-MOT: Instance-Aware Multi-Object Tracking with Motion Consistency
This work addresses a key problem in computer vision for applications like autonomous driving and surveillance by improving tracking accuracy in dynamic environments, though it appears incremental as it builds on existing tracking-by-detection methods.
The paper tackles the challenge of multiple object tracking in both static and moving camera scenarios by proposing an instance-aware framework that integrates appearance features and motion consistency, achieving first place in the Track 3 of the BMTT Challenge at CVPR2020 workshops.
Multiple object tracking (MOT) is a crucial task in computer vision society. However, most tracking-by-detection MOT methods, with available detected bounding boxes, cannot effectively handle static, slow-moving and fast-moving camera scenarios simultaneously due to ego-motion and frequent occlusion. In this work, we propose a novel tracking framework, called "instance-aware MOT" (IA-MOT), that can track multiple objects in either static or moving cameras by jointly considering the instance-level features and object motions. First, robust appearance features are extracted from a variant of Mask R-CNN detector with an additional embedding head, by sending the given detections as the region proposals. Meanwhile, the spatial attention, which focuses on the foreground within the bounding boxes, is generated from the given instance masks and applied to the extracted embedding features. In the tracking stage, object instance masks are aligned by feature similarity and motion consistency using the Hungarian association algorithm. Moreover, object re-identification (ReID) is incorporated to recover ID switches caused by long-term occlusion or missing detection. Overall, when evaluated on the MOTS20 and KITTI-MOTS dataset, our proposed method won the first place in Track 3 of the BMTT Challenge in CVPR2020 workshops.