CVJun 9, 2022

Simple Cues Lead to a Strong Multi-Object Tracker

arXiv:2206.04656v799 citationsh-index: 45Has Code
Originality Incremental advance
AI Analysis

This work provides a simple and effective solution for multi-object tracking in computer vision, though it is incremental as it builds on existing tracking-by-detection methods.

The paper tackled the problem of multi-object tracking by revisiting the tracking-by-detection paradigm, showing that a standard re-identification network combined with a simple motion model achieves state-of-the-art performance on four public datasets.

For a long time, the most common paradigm in Multi-Object Tracking was tracking-by-detection (TbD), where objects are first detected and then associated over video frames. For association, most models resourced to motion and appearance cues, e.g., re-identification networks. Recent approaches based on attention propose to learn the cues in a data-driven manner, showing impressive results. In this paper, we ask ourselves whether simple good old TbD methods are also capable of achieving the performance of end-to-end models. To this end, we propose two key ingredients that allow a standard re-identification network to excel at appearance-based tracking. We extensively analyse its failure cases, and show that a combination of our appearance features with a simple motion model leads to strong tracking results. Our tracker generalizes to four public datasets, namely MOT17, MOT20, BDD100k, and DanceTrack, achieving state-of-the-art performance. https://github.com/dvl-tum/GHOST.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes