CVSep 20, 2023

Multi-grained Temporal Prototype Learning for Few-shot Video Object Segmentation

arXiv:2309.11160v112 citationsh-index: 56Has Code
Originality Incremental advance
AI Analysis

This addresses the underexplored problem of segmenting objects in videos with minimal annotated examples, offering a domain-specific advancement for video analysis applications.

The paper tackles few-shot video object segmentation by introducing multi-grained temporal prototypes to capture local and long-term guidance, achieving state-of-the-art performance on two benchmark datasets with significant improvements over previous models.

Few-Shot Video Object Segmentation (FSVOS) aims to segment objects in a query video with the same category defined by a few annotated support images. However, this task was seldom explored. In this work, based on IPMT, a state-of-the-art few-shot image segmentation method that combines external support guidance information with adaptive query guidance cues, we propose to leverage multi-grained temporal guidance information for handling the temporal correlation nature of video data. We decompose the query video information into a clip prototype and a memory prototype for capturing local and long-term internal temporal guidance, respectively. Frame prototypes are further used for each frame independently to handle fine-grained adaptive guidance and enable bidirectional clip-frame prototype communication. To reduce the influence of noisy memory, we propose to leverage the structural similarity relation among different predicted regions and the support for selecting reliable memory frames. Furthermore, a new segmentation loss is also proposed to enhance the category discriminability of the learned prototypes. Experimental results demonstrate that our proposed video IPMT model significantly outperforms previous models on two benchmark datasets. Code is available at https://github.com/nankepan/VIPMT.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes