An Exploration of Target-Conditioned Segmentation Methods for Visual Object Trackers
This work addresses the need for precise object representation in visual object tracking, but it is incremental as it adapts existing methods rather than introducing a new paradigm.
The paper tackled the problem of transforming bounding-box trackers into segmentation trackers by exploring target-conditioned segmentation methods, achieving competitive performance with recent segmentation trackers while operating at quasi real-time speeds.
Visual object tracking is the problem of predicting a target object's state in a video. Generally, bounding-boxes have been used to represent states, and a surge of effort has been spent by the community to produce efficient causal algorithms capable of locating targets with such representations. As the field is moving towards binary segmentation masks to define objects more precisely, in this paper we propose to extensively explore target-conditioned segmentation methods available in the computer vision community, in order to transform any bounding-box tracker into a segmentation tracker. Our analysis shows that such methods allow trackers to compete with recently proposed segmentation trackers, while performing quasi real-time.