CVMay 19, 2025

FlowCut: Unsupervised Video Instance Segmentation via Temporal Mask Matching

arXiv:2505.13174v11 citationsh-index: 6
Originality Incremental advance
AI Analysis

This addresses the problem of segmenting instances in videos without manual annotations, which is incremental as it builds on existing unsupervised methods.

The paper tackles unsupervised video instance segmentation by proposing FlowCut, a three-stage framework that generates pseudo-labels from images and optical flows, matches them temporally, and trains a model, achieving state-of-the-art results on benchmarks like YouTubeVIS-2019 and DAVIS-2017.

We propose FlowCut, a simple and capable method for unsupervised video instance segmentation consisting of a three-stage framework to construct a high-quality video dataset with pseudo labels. To our knowledge, our work is the first attempt to curate a video dataset with pseudo-labels for unsupervised video instance segmentation. In the first stage, we generate pseudo-instance masks by exploiting the affinities of features from both images and optical flows. In the second stage, we construct short video segments containing high-quality, consistent pseudo-instance masks by temporally matching them across the frames. In the third stage, we use the YouTubeVIS-2021 video dataset to extract our training instance segmentation set, and then train a video segmentation model. FlowCut achieves state-of-the-art performance on the YouTubeVIS-2019, YouTubeVIS-2021, DAVIS-2017, and DAVIS-2017 Motion benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes