Semi-TCL: Semi-Supervised Track Contrastive Representation Learning
This work addresses the challenge of robust object tracking in videos for computer vision applications, offering an incremental improvement over existing methods.
The paper tackled the problem of learning appearance embeddings for multiple object tracking by introducing an instance-to-track matching objective that leverages temporal continuity, enabling semi-supervised learning from labeled and unlabeled videos. The method outperformed state-of-the-art approaches on multiple benchmarks.
Online tracking of multiple objects in videos requires strong capacity of modeling and matching object appearances. Previous methods for learning appearance embedding mostly rely on instance-level matching without considering the temporal continuity provided by videos. We design a new instance-to-track matching objective to learn appearance embedding that compares a candidate detection to the embedding of the tracks persisted in the tracker. It enables us to learn not only from videos labeled with complete tracks, but also unlabeled or partially labeled videos. We implement this learning objective in a unified form following the spirit of constrastive loss. Experiments on multiple object tracking datasets demonstrate that our method can effectively learning discriminative appearance embeddings in a semi-supervised fashion and outperform state of the art methods on representative benchmarks.