CVAug 22, 2022

SWEM: Towards Real-Time Video Object Segmentation with Sequential Weighted Expectation-Maximization

arXiv:2208.10128v160 citationsh-index: 44Has Code
Originality Incremental advance
AI Analysis

This addresses the problem of slow inference in video object segmentation for real-time applications, though it is incremental as it builds on existing matching-based methods.

The paper tackles the inefficiency in semi-supervised video object segmentation caused by redundant template features in memory-based methods, proposing SWEM to reduce redundancy and achieve high performance (84.3% J&F on DAVIS 2017) at real-time speed (36 FPS).

Matching-based methods, especially those based on space-time memory, are significantly ahead of other solutions in semi-supervised video object segmentation (VOS). However, continuously growing and redundant template features lead to an inefficient inference. To alleviate this, we propose a novel Sequential Weighted Expectation-Maximization (SWEM) network to greatly reduce the redundancy of memory features. Different from the previous methods which only detect feature redundancy between frames, SWEM merges both intra-frame and inter-frame similar features by leveraging the sequential weighted EM algorithm. Further, adaptive weights for frame features endow SWEM with the flexibility to represent hard samples, improving the discrimination of templates. Besides, the proposed method maintains a fixed number of template features in memory, which ensures the stable inference complexity of the VOS system. Extensive experiments on commonly used DAVIS and YouTube-VOS datasets verify the high efficiency (36 FPS) and high performance (84.3\% $\mathcal{J}\&\mathcal{F}$ on DAVIS 2017 validation dataset) of SWEM. Code is available at: https://github.com/lmm077/SWEM.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes