CVNov 25, 2024

Deformable Mamba for Wide Field of View Segmentation

arXiv:2411.16481v210 citationsh-index: 22
AI Analysis

This addresses performance degradation in distortion-prone dense prediction tasks for computer vision applications, representing an incremental advancement.

The paper tackled the problem of designing a Mamba-based decoder for wide field of view segmentation, achieving a +2.5% performance improvement on the 360° Stanford2D3D benchmark while reducing 72% parameters and 97% FLOPs.

Recent advancements in the Mamba architecture, with its linear computational complexity, being a promising alternative to transformer architectures suffering from quadratic complexity. While existing works primarily focus on adapting Mamba as vision encoders, the critical role of task-specific Mamba decoders remains under-explored, particularly for distortion-prone dense prediction tasks. This paper addresses two interconnected challenges: (1) The design of a Mamba-based decoder that seamlessly adapts to various architectures (e.g., CNN-, Transformer-, and Mamba-based backbones), and (2) The performance degradation in decoders lacking distortion-aware capability when processing wide-FoV images (e.g., 180° fisheye and 360° panoramic settings). We propose the Deformable Mamba Decoder, an efficient distortion-aware decoder that integrates Mamba's computational efficiency with adaptive distortion awareness. Comprehensive experiments on five wide-FoV segmentation benchmarks validate its effectiveness. Notably, our decoder achieves a +2.5% performance improvement on the 360° Stanford2D3D segmentation benchmark while reducing 72% parameters and 97% FLOPs, as compared to the widely-used decoder heads.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes