Track, then Decide: Category-Agnostic Vision-based Multi-Object Tracking
This addresses the need for mobile systems to track generic unknown objects in rich human-made environments, representing an incremental improvement by extending tracking capabilities beyond predefined categories.
The paper tackles the problem of multi-object tracking in diverse environments where detectors for every object category are infeasible, proposing a category-agnostic segmentation-based tracker that achieves performance comparable to state-of-the-art methods for cars and pedestrians while also tracking a large variety of other objects.
The most common paradigm for vision-based multi-object tracking is tracking-by-detection, due to the availability of reliable detectors for several important object categories such as cars and pedestrians. However, future mobile systems will need a capability to cope with rich human-made environments, in which obtaining detectors for every possible object category would be infeasible. In this paper, we propose a model-free multi-object tracking approach that uses a category-agnostic image segmentation method to track objects. We present an efficient segmentation mask-based tracker which associates pixel-precise masks reported by the segmentation. Our approach can utilize semantic information whenever it is available for classifying objects at the track level, while retaining the capability to track generic unknown objects in the absence of such information. We demonstrate experimentally that our approach achieves performance comparable to state-of-the-art tracking-by-detection methods for popular object categories such as cars and pedestrians. Additionally, we show that the proposed method can discover and robustly track a large variety of other objects.