CVAILGAug 28, 2023

VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation

Meta AI
arXiv:2308.14710v138 citationsh-index: 156
Originality Highly original
AI Analysis

This addresses the problem of segmenting and tracking multiple instances in videos without motion-based signals for researchers in computer vision, offering a surprisingly simple and effective approach.

The paper tackles unsupervised video instance segmentation by introducing VideoCutLER, a method that uses high-quality pseudo masks and video synthesis for training, achieving 50.7% APvideo^50 on YouTubeVIS-2019, surpassing previous state-of-the-art by a large margin.

Existing approaches to unsupervised video instance segmentation typically rely on motion estimates and experience difficulties tracking small or divergent motions. We present VideoCutLER, a simple method for unsupervised multi-instance video segmentation without using motion-based learning signals like optical flow or training on natural videos. Our key insight is that using high-quality pseudo masks and a simple video synthesis method for model training is surprisingly sufficient to enable the resulting video model to effectively segment and track multiple instances across video frames. We show the first competitive unsupervised learning results on the challenging YouTubeVIS-2019 benchmark, achieving 50.7% APvideo^50 , surpassing the previous state-of-the-art by a large margin. VideoCutLER can also serve as a strong pretrained model for supervised video instance segmentation tasks, exceeding DINO by 15.9% on YouTubeVIS-2019 in terms of APvideo.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes