Exploiting Temporal Coherence for Self-Supervised One-shot Video Re-identification
This work addresses the need for efficient labeling in large camera networks, offering an incremental improvement over existing one-shot methods.
The paper tackles the problem of reducing labeling effort in video re-identification by proposing a self-supervised one-shot method that exploits temporal coherence among unlabeled tracklets, achieving up to 8% improvement in label estimation accuracy and better re-identification performance compared to state-of-the-art techniques.
While supervised techniques in re-identification are extremely effective, the need for large amounts of annotations makes them impractical for large camera networks. One-shot re-identification, which uses a singular labeled tracklet for each identity along with a pool of unlabeled tracklets, is a potential candidate towards reducing this labeling effort. Current one-shot re-identification methods function by modeling the inter-relationships amongst the labeled and the unlabeled data, but fail to fully exploit such relationships that exist within the pool of unlabeled data itself. In this paper, we propose a new framework named Temporal Consistency Progressive Learning, which uses temporal coherence as a novel self-supervised auxiliary task in the one-shot learning paradigm to capture such relationships amongst the unlabeled tracklets. Optimizing two new losses, which enforce consistency on a local and global scale, our framework can learn learn richer and more discriminative representations. Extensive experiments on two challenging video re-identification datasets - MARS and DukeMTMC-VideoReID - demonstrate that our proposed method is able to estimate the true labels of the unlabeled data more accurately by up to $8\%$, and obtain significantly better re-identification performance compared to the existing state-of-the-art techniques.