CVApr 6, 2025

SAM2MOT: A Novel Paradigm of Multi-Object Tracking by Segmentation

arXiv:2504.04519v520 citationsh-index: 8Has Code
Originality Highly original
AI Analysis

This addresses the problem of false positives and occlusions in multi-object tracking for computer vision applications, representing a novel paradigm shift rather than an incremental improvement.

The paper tackles multi-object tracking by proposing a segmentation-driven paradigm that replaces the conventional detection-association framework, achieving state-of-the-art results with improvements of +2.1 HOTA and +4.5 IDF1 on DanceTrack.

Inspired by Segment Anything 2, which generalizes segmentation from images to videos, we propose SAM2MOT--a novel segmentation-driven paradigm for multi-object tracking that breaks away from the conventional detection-association framework. In contrast to previous approaches that treat segmentation as auxiliary information, SAM2MOT places it at the heart of the tracking process, systematically tackling challenges like false positives and occlusions. Its effectiveness has been thoroughly validated on major MOT benchmarks. Furthermore, SAM2MOT integrates pre-trained detector, pre-trained segmentor with tracking logic into a zero-shot MOT system that requires no fine-tuning. This significantly reduces dependence on labeled data and paves the way for transitioning MOT research from task-specific solutions to general-purpose systems. Experiments on DanceTrack, UAVDT, and BDD100K show state-of-the-art results. Notably, SAM2MOT outperforms existing methods on DanceTrack by +2.1 HOTA and +4.5 IDF1, highlighting its effectiveness in MOT. Code is available at https://github.com/TripleJoy/SAM2MOT.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes