TF-SASM: Training-free Spatial-aware Sparse Memory for Multi-object Tracking
This work addresses efficiency issues in multi-object tracking for computer vision applications, offering an incremental improvement over existing memory-based methods.
The paper tackles the challenge of high computational complexity and memory usage in multi-object tracking by proposing a training-free spatial-aware sparse memory that selectively stores critical features based on object motion and overlapping awareness, resulting in a 2.0% gain in AssA score and 2.1% in IDF1 score on the DanceTrack test set.
Multi-object tracking (MOT) in computer vision remains a significant challenge, requiring precise localization and continuous tracking of multiple objects in video sequences. The emergence of data sets that emphasize robust reidentification, such as DanceTrack, has highlighted the need for effective solutions. While memory-based approaches have shown promise, they often suffer from high computational complexity and memory usage due to storing feature at every single frame. In this paper, we propose a novel memory-based approach that selectively stores critical features based on object motion and overlapping awareness, aiming to enhance efficiency while minimizing redundancy. As a result, our method not only store longer temporal information with limited number of stored features in the memory, but also diversify states of a particular object to enhance the association performance. Our approach significantly improves over MOTRv2 in the DanceTrack test set, demonstrating a gain of 2.0% AssA score and 2.1% in IDF1 score.