CVIVApr 9, 2024

Spatial-Temporal Multi-level Association for Video Object Segmentation

arXiv:2404.06265v16 citationsh-index: 23ECCV
Originality Incremental advance
AI Analysis

This work addresses video object segmentation for computer vision applications, but it appears incremental as it builds on existing methods by combining spatial-temporal features.

The paper tackled the problem of insufficient target interaction and inefficient parallel processing in semi-supervised video object segmentation by proposing a spatial-temporal multi-level association framework, achieving favorable performance against state-of-the-art methods on datasets like DAVIS and YouTube-VOS.

Existing semi-supervised video object segmentation methods either focus on temporal feature matching or spatial-temporal feature modeling. However, they do not address the issues of sufficient target interaction and efficient parallel processing simultaneously, thereby constraining the learning of dynamic, target-aware features. To tackle these limitations, this paper proposes a spatial-temporal multi-level association framework, which jointly associates reference frame, test frame, and object features to achieve sufficient interaction and parallel target ID association with a spatial-temporal memory bank for efficient video object segmentation. Specifically, we construct a spatial-temporal multi-level feature association module to learn better target-aware features, which formulates feature extraction and interaction as the efficient operations of object self-attention, reference object enhancement, and test reference correlation. In addition, we propose a spatial-temporal memory to assist feature association and temporal ID assignment and correlation. We evaluate the proposed method by conducting extensive experiments on numerous video object segmentation datasets, including DAVIS 2016/2017 val, DAVIS 2017 test-dev, and YouTube-VOS 2018/2019 val. The favorable performance against the state-of-the-art methods demonstrates the effectiveness of our approach. All source code and trained models will be made publicly available.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes