CVAug 11, 2024

MacFormer: Semantic Segmentation with Fine Object Boundaries

arXiv:2408.05699v16 citationsh-index: 7
Originality Incremental advance
AI Analysis

This work addresses a specific challenge in computer vision for applications requiring fine-grained segmentation, representing an incremental improvement over existing methods.

The paper tackles the problem of imprecise object boundary predictions in semantic segmentation by introducing MacFormer, which uses a Mutual Agent Cross-Attention mechanism and a Frequency Enhancement Module to improve feature integration and boundary handling, achieving superior accuracy and efficiency on ADE20K and Cityscapes datasets.

Semantic segmentation involves assigning a specific category to each pixel in an image. While Vision Transformer-based models have made significant progress, current semantic segmentation methods often struggle with precise predictions in localized areas like object boundaries. To tackle this challenge, we introduce a new semantic segmentation architecture, ``MacFormer'', which features two key components. Firstly, using learnable agent tokens, a Mutual Agent Cross-Attention (MACA) mechanism effectively facilitates the bidirectional integration of features across encoder and decoder layers. This enables better preservation of low-level features, such as elementary edges, during decoding. Secondly, a Frequency Enhancement Module (FEM) in the decoder leverages high-frequency and low-frequency components to boost features in the frequency domain, benefiting object boundaries with minimal computational complexity increase. MacFormer is demonstrated to be compatible with various network architectures and outperforms existing methods in both accuracy and efficiency on benchmark datasets ADE20K and Cityscapes under different computational constraints.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes