CVNov 14, 2023

Contrastive Learning for Multi-Object Tracking with Transformers

arXiv:2311.08043v124 citationsh-index: 191
AI Analysis

It addresses multi-object tracking for applications like autonomous driving, offering a simpler and more efficient approach compared to prior complex extensions.

The paper tackles multi-object tracking by adapting DETR with an instance-level contrastive loss and lightweight modifications, achieving a +2.6 mMOTA improvement on BDD100K and competitive results on MOT17.

The DEtection TRansformer (DETR) opened new possibilities for object detection by modeling it as a translation task: converting image features into object-level representations. Previous works typically add expensive modules to DETR to perform Multi-Object Tracking (MOT), resulting in more complicated architectures. We instead show how DETR can be turned into a MOT model by employing an instance-level contrastive loss, a revised sampling strategy and a lightweight assignment method. Our training scheme learns object appearances while preserving detection capabilities and with little overhead. Its performance surpasses the previous state-of-the-art by +2.6 mMOTA on the challenging BDD100K dataset and is comparable to existing transformer-based methods on the MOT17 dataset.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes