CVMay 2, 2022

Boosting Video Object Segmentation based on Scale Inconsistency

arXiv:2205.01197v11.4h-index: 14Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses performance limitations in video object segmentation for computer vision applications, but it is incremental as it refines existing models rather than introducing a new paradigm.

The paper tackles the problem of inconsistent predictions in semi-supervised video object segmentation by proposing a refinement framework based on scale inconsistency, which improves performance on DAVIS datasets when applied to various pre-trained models.

We present a refinement framework to boost the performance of pre-trained semi-supervised video object segmentation (VOS) models. Our work is based on scale inconsistency, which is motivated by the observation that existing VOS models generate inconsistent predictions from input frames with different sizes. We use the scale inconsistency as a clue to devise a pixel-level attention module that aggregates the advantages of the predictions from different-size inputs. The scale inconsistency is also used to regularize the training based on a pixel-level variance measured by an uncertainty estimation. We further present a self-supervised online adaptation, tailored for test-time optimization, that bootstraps the predictions without ground-truth masks based on the scale inconsistency. Experiments on DAVIS 16 and DAVIS 17 datasets show that our framework can be generically applied to various VOS models and improve their performance.

View on arXiv PDF Code

Similar