Collecting Consistently High Quality Object Tracks with Minimal Human Involvement by Using Self-Supervised Learning to Detect Tracker Errors
This work addresses the challenge of reducing human effort in object tracking for video analysis applications, representing an incremental improvement over existing methods.
The paper tackles the problem of maintaining high-quality object tracking with minimal human intervention by using self-supervised learning to detect tracker failures, resulting in improved performance on small, fast-moving, or occluded objects across three datasets.
We propose a hybrid framework for consistently producing high-quality object tracks by combining an automated object tracker with little human input. The key idea is to tailor a module for each dataset to intelligently decide when an object tracker is failing and so humans should be brought in to re-localize an object for continued tracking. Our approach leverages self-supervised learning on unlabeled videos to learn a tailored representation for a target object that is then used to actively monitor its tracked region and decide when the tracker fails. Since labeled data is not needed, our approach can be applied to novel object categories. Experiments on three datasets demonstrate our method outperforms existing approaches, especially for small, fast moving, or occluded objects.