CVMay 7, 2021

MOTR: End-to-End Multiple-Object Tracking with Transformer

arXiv:2105.03247v4789 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the challenge of end-to-end temporal modeling in MOT for applications like video analysis, offering a stronger baseline for future research, though it builds incrementally on DETR.

The paper tackles the problem of temporal modeling in multiple object tracking by proposing MOTR, an end-to-end Transformer-based method that introduces track queries for iterative prediction, achieving a 6.5% improvement in HOTA over ByteTrack on DanceTrack and outperforming concurrent works on MOT17.

Temporal modeling of objects is a key challenge in multiple object tracking (MOT). Existing methods track by associating detections through motion-based and appearance-based similarity heuristics. The post-processing nature of association prevents end-to-end exploitation of temporal variations in video sequence. In this paper, we propose MOTR, which extends DETR and introduces track query to model the tracked instances in the entire video. Track query is transferred and updated frame-by-frame to perform iterative prediction over time. We propose tracklet-aware label assignment to train track queries and newborn object queries. We further propose temporal aggregation network and collective average loss to enhance temporal relation modeling. Experimental results on DanceTrack show that MOTR significantly outperforms state-of-the-art method, ByteTrack by 6.5% on HOTA metric. On MOT17, MOTR outperforms our concurrent works, TrackFormer and TransTrack, on association performance. MOTR can serve as a stronger baseline for future research on temporal modeling and Transformer-based trackers. Code is available at https://github.com/megvii-research/MOTR.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes