CVJun 9, 2022

VITA: Video Instance Segmentation via Object Token Association

arXiv:2206.04403v2133 citationsh-index: 34Has Code
Originality Highly original
AI Analysis

This work addresses video instance segmentation for computer vision applications, offering a novel approach that improves performance and enables practical advantages like handling long videos on common GPUs.

The paper tackles video instance segmentation by introducing VITA, a method that uses object tokens from an image detector to associate objects across frames without spatio-temporal backbone features, achieving state-of-the-art results with 49.8 AP on YouTube-VIS 2019 and 45.7 AP on YouTube-VIS 2021.

We introduce a novel paradigm for offline Video Instance Segmentation (VIS), based on the hypothesis that explicit object-oriented information can be a strong clue for understanding the context of the entire sequence. To this end, we propose VITA, a simple structure built on top of an off-the-shelf Transformer-based image instance segmentation model. Specifically, we use an image object detector as a means of distilling object-specific contexts into object tokens. VITA accomplishes video-level understanding by associating frame-level object tokens without using spatio-temporal backbone features. By effectively building relationships between objects using the condensed information, VITA achieves the state-of-the-art on VIS benchmarks with a ResNet-50 backbone: 49.8 AP, 45.7 AP on YouTube-VIS 2019 & 2021, and 19.6 AP on OVIS. Moreover, thanks to its object token-based structure that is disjoint from the backbone features, VITA shows several practical advantages that previous offline VIS methods have not explored - handling long and high-resolution videos with a common GPU, and freezing a frame-level detector trained on image domain. Code is available at https://github.com/sukjunhwang/VITA.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes