CV AI MMJun 22, 2021

Tracking Instances as Queries

Shusheng Yang, Yuxin Fang, Xinggang Wang, Yu Li, Ying Shan, Bin Feng, Wenyu Liu

arXiv:2106.11963v27.312 citations

Originality Incremental advance

AI Analysis

This addresses video instance segmentation for computer vision applications, offering a competitive end-to-end model with incremental improvements.

The paper tackles video instance segmentation by proposing QueryTrack, a unified query-based framework that leverages the one-to-one correspondence between instances and queries, achieving 52.7 AP on YouTube-VIS-2019 and 52.3 AP on YouTube-VIS-2021 datasets.

Recently, query based deep networks catch lots of attention owing to their end-to-end pipeline and competitive results on several fundamental computer vision tasks, such as object detection, semantic segmentation, and instance segmentation. However, how to establish a query based video instance segmentation (VIS) framework with elegant architecture and strong performance remains to be settled. In this paper, we present \textbf{QueryTrack} (i.e., tracking instances as queries), a unified query based VIS framework fully leveraging the intrinsic one-to-one correspondence between instances and queries in QueryInst. The proposed method obtains 52.7 / 52.3 AP on YouTube-VIS-2019 / 2021 datasets, which wins the 2-nd place in the YouTube-VIS Challenge at CVPR 2021 \textbf{with a single online end-to-end model, single scale testing \& modest amount of training data}. We also provide QueryTrack-ResNet-50 baseline results on YouTube-VIS-2021 val set as references for the VIS community.

View on arXiv PDF

Similar