CVApr 3, 2023

Open-Vocabulary Semantic Segmentation with Decoupled One-Pass Network

arXiv:2304.01198v256 citationsh-index: 32Has Code
Originality Highly original
AI Analysis

This work addresses a computational bottleneck in open-vocabulary semantic segmentation, making it more efficient for applications requiring real-time or large-scale processing.

The paper tackles the inefficiency of existing two-stream networks for open-vocabulary semantic segmentation by proposing a decoupled one-pass network that reduces the need for multiple image crops, achieving state-of-the-art performance while being 4 to 7 times faster at inference.

Recently, the open-vocabulary semantic segmentation problem has attracted increasing attention and the best performing methods are based on two-stream networks: one stream for proposal mask generation and the other for segment classification using a pretrained visual-language model. However, existing two-stream methods require passing a great number of (up to a hundred) image crops into the visual-language model, which is highly inefficient. To address the problem, we propose a network that only needs a single pass through the visual-language model for each input image. Specifically, we first propose a novel network adaptation approach, termed patch severance, to restrict the harmful interference between the patch embeddings in the pre-trained visual encoder. We then propose classification anchor learning to encourage the network to spatially focus on more discriminative features for classification. Extensive experiments demonstrate that the proposed method achieves outstanding performance, surpassing state-of-the-art methods while being 4 to 7 times faster at inference. Code: https://github.com/CongHan0808/DeOP.git

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes