CVAIJul 2, 2024

MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders

arXiv:2407.02228v238 citationsh-index: 15Has Code
Originality Incremental advance
AI Analysis

This work addresses multi-task dense prediction for scene understanding applications, representing an incremental advance by adapting Mamba to this domain.

The paper tackles multi-task dense scene understanding by proposing MTMamba, a Mamba-based architecture that models long-range dependencies and cross-task interactions, achieving improvements of up to +5.01 over previous methods on tasks like semantic segmentation and human parsing.

Multi-task dense scene understanding, which learns a model for multiple dense prediction tasks, has a wide range of application scenarios. Modeling long-range dependency and enhancing cross-task interactions are crucial to multi-task dense prediction. In this paper, we propose MTMamba, a novel Mamba-based architecture for multi-task scene understanding. It contains two types of core blocks: self-task Mamba (STM) block and cross-task Mamba (CTM) block. STM handles long-range dependency by leveraging Mamba, while CTM explicitly models task interactions to facilitate information exchange across tasks. Experiments on NYUDv2 and PASCAL-Context datasets demonstrate the superior performance of MTMamba over Transformer-based and CNN-based methods. Notably, on the PASCAL-Context dataset, MTMamba achieves improvements of +2.08, +5.01, and +4.90 over the previous best methods in the tasks of semantic segmentation, human parsing, and object boundary detection, respectively. The code is available at https://github.com/EnVision-Research/MTMamba.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes