CVSep 21, 2024

Temporally Propagated Masks and Bounding Boxes: Combining the Best of Both Worlds for Multi-Object Tracking

arXiv:2409.14220v3h-index: 2
Originality Incremental advance
AI Analysis

This work addresses the problem of consistent object tracking in videos for applications like surveillance and sports analysis, offering an incremental improvement by integrating mask and box cues.

The paper tackles the challenge of multi-object tracking by proposing McByte, a method that combines temporally propagated segmentation masks with bounding boxes to improve robustness and generalizability, achieving performance gains across four benchmark datasets.

Multi-object tracking (MOT) involves identifying and consistently tracking objects across video sequences. Traditional tracking-by-detection methods, while effective, often require extensive tuning and lack generalizability. On the other hand, segmentation mask-based methods are more generic but struggle with tracking management, making them unsuitable for MOT. We propose a novel approach, McByte, which incorporates a temporally propagated segmentation mask as a strong association cue within a tracking-by-detection framework. By combining bounding box and propagated mask information, McByte enhances robustness and generalizability without per-sequence tuning. Evaluated on four benchmark datasets - DanceTrack, MOT17, SoccerNet-tracking 2022, and KITTI-tracking - McByte demonstrates performance gain in all cases examined. At the same time, it outperforms existing mask-based methods. Implementation code will be provided upon acceptance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes