CVJul 28, 2023

MeMOTR: Long-Term Memory-Augmented Transformer for Multi-Object Tracking

arXiv:2307.15700v327.2166 citationsh-index: 65Has Code

Originality Highly original

AI Analysis

This work improves multi-object tracking for video analysis applications, representing an incremental advance with strong specific gains.

The paper tackles the problem of multi-object tracking by addressing the lack of long-term temporal information modeling in existing methods, resulting in a 7.9% and 13.0% improvement over state-of-the-art on HOTA and AssA metrics on DanceTrack.

As a video task, Multiple Object Tracking (MOT) is expected to capture temporal information of targets effectively. Unfortunately, most existing methods only explicitly exploit the object features between adjacent frames, while lacking the capacity to model long-term temporal information. In this paper, we propose MeMOTR, a long-term memory-augmented Transformer for multi-object tracking. Our method is able to make the same object's track embedding more stable and distinguishable by leveraging long-term memory injection with a customized memory-attention layer. This significantly improves the target association ability of our model. Experimental results on DanceTrack show that MeMOTR impressively surpasses the state-of-the-art method by 7.9% and 13.0% on HOTA and AssA metrics, respectively. Furthermore, our model also outperforms other Transformer-based methods on association performance on MOT17 and generalizes well on BDD100K. Code is available at https://github.com/MCG-NJU/MeMOTR.

View on arXiv PDF Code

Similar