CVDec 16, 2024

SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation

arXiv:2412.11890v220.956 citationsh-index: 6Has CodeCVPR

Originality Incremental advance

AI Analysis

This work addresses the need for efficient and accurate semantic segmentation in computer vision, offering incremental improvements over existing methods.

The paper tackles the problem of semantic segmentation by proposing SegMAN, a model that integrates global context modeling, local detail encoding, and multi-scale feature extraction, achieving improvements such as 52.6% mIoU on ADE20K (+1.6% over prior work) and 83.8% mIoU on Cityscapes (+2.1% over prior work).

High-quality semantic segmentation relies on three key capabilities: global context modeling, local detail encoding, and multi-scale feature extraction. However, recent methods struggle to possess all these capabilities simultaneously. Hence, we aim to empower segmentation networks to simultaneously carry out efficient global context modeling, high-quality local detail encoding, and rich multi-scale feature representation for varying input resolutions. In this paper, we introduce SegMAN, a novel linear-time model comprising a hybrid feature encoder dubbed SegMAN Encoder, and a decoder based on state space models. Specifically, the SegMAN Encoder synergistically integrates sliding local attention with dynamic state space models, enabling highly efficient global context modeling while preserving fine-grained local details. Meanwhile, the MMSCopE module in our decoder enhances multi-scale context feature extraction and adaptively scales with the input resolution. Our SegMAN-B Encoder achieves 85.1% ImageNet-1k accuracy (+1.5% over VMamba-S with fewer parameters). When paired with our decoder, the full SegMAN-B model achieves 52.6% mIoU on ADE20K (+1.6% over SegNeXt-L with 15% fewer GFLOPs), 83.8% mIoU on Cityscapes (+2.1% over SegFormer-B3 with half the GFLOPs), and 1.6% higher mIoU than VWFormer-B3 on COCO-Stuff with lower GFLOPs. Our code is available at https://github.com/yunxiangfu2001/SegMAN.

View on arXiv PDF Code

Similar