CVAug 6, 2021

Full-Duplex Strategy for Video Object Segmentation

arXiv:2108.03151v3173 citations
AI Analysis

This work addresses inefficiencies in video object segmentation for computer vision applications, representing an incremental improvement over existing methods.

The paper tackles the limited feature collaboration between appearance and motion cues in video object segmentation by proposing a full-duplex strategy network (FSNet) with relational cross-attention and bidirectional purification modules, achieving favorable performance on five benchmarks against cutting-edge methods.

Previous video object segmentation approaches mainly focus on using simplex solutions between appearance and motion, limiting feature collaboration efficiency among and across these two cues. In this work, we study a novel and efficient full-duplex strategy network (FSNet) to address this issue, by considering a better mutual restraint scheme between motion and appearance in exploiting the cross-modal features from the fusion and decoding stage. Specifically, we introduce the relational cross-attention module (RCAM) to achieve bidirectional message propagation across embedding sub-spaces. To improve the model's robustness and update the inconsistent features from the spatial-temporal embeddings, we adopt the bidirectional purification module (BPM) after the RCAM. Extensive experiments on five popular benchmarks show that our FSNet is robust to various challenging scenarios (e.g., motion blur, occlusion) and achieves favourable performance against existing cutting-edges both in the video object segmentation and video salient object detection tasks. The project is publicly available at: https://dpfan.net/FSNet.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes