CVJun 27, 2023

TrickVOS: A Bag of Tricks for Video Object Segmentation

arXiv:2306.15377v2h-index: 18
Originality Incremental advance
AI Analysis

This work provides incremental improvements for video object segmentation researchers and practitioners, enabling real-time performance on mobile devices.

The paper tackled improving space-time memory networks for semi-supervised video object segmentation by addressing supervisory signal, pretraining, and spatial awareness, resulting in a lightweight network that achieves competitive results on benchmarks and runs in real-time on mobile devices.

Space-time memory (STM) network methods have been dominant in semi-supervised video object segmentation (SVOS) due to their remarkable performance. In this work, we identify three key aspects where we can improve such methods; i) supervisory signal, ii) pretraining and iii) spatial awareness. We then propose TrickVOS; a generic, method-agnostic bag of tricks addressing each aspect with i) a structure-aware hybrid loss, ii) a simple decoder pretraining regime and iii) a cheap tracker that imposes spatial constraints in model predictions. Finally, we propose a lightweight network and show that when trained with TrickVOS, it achieves competitive results to state-of-the-art methods on DAVIS and YouTube benchmarks, while being one of the first STM-based SVOS methods that can run in real-time on a mobile device.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes