CVLGFeb 18, 2020

MAST: A Memory-Augmented Self-supervised Tracker

arXiv:2002.07793v2202 citations
AI Analysis

This work addresses the need for annotation-free dense tracking in computer vision, offering a competitive self-supervised approach that could reduce reliance on labeled data.

The paper tackles the problem of self-supervised dense tracking, which lags behind supervised methods, by proposing a memory-augmented model that surpasses previous self-supervised methods by 15% and achieves performance comparable to supervised methods on standard benchmarks.

Recent interest in self-supervised dense tracking has yielded rapid progress, but performance still remains far from supervised methods. We propose a dense tracking model trained on videos without any annotations that surpasses previous self-supervised methods on existing benchmarks by a significant margin (+15%), and achieves performance comparable to supervised methods. In this paper, we first reassess the traditional choices used for self-supervised training and reconstruction loss by conducting thorough experiments that finally elucidate the optimal choices. Second, we further improve on existing methods by augmenting our architecture with a crucial memory component. Third, we benchmark on large-scale semi-supervised video object segmentation(aka. dense tracking), and propose a new metric: generalizability. Our first two contributions yield a self-supervised network that for the first time is competitive with supervised methods on standard evaluation metrics of dense tracking. When measuring generalizability, we show self-supervised approaches are actually superior to the majority of supervised methods. We believe this new generalizability metric can better capture the real-world use-cases for dense tracking, and will spur new interest in this research direction.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes