UAM: A Unified Attention-Mamba Backbone of Multimodal Framework for Tumor Cell Classification
This work addresses tumor analysis for medical imaging, offering incremental improvements in multimodal tasks.
The paper tackles tumor cell classification and segmentation by introducing a Unified Attention-Mamba (UAM) backbone that flexibly combines attention and Mamba modules, achieving state-of-the-art results with accuracy improvements from 74% to 78% in classification and precision from 75% to 80% in segmentation.
Inspired by the recent success of the Mamba architecture in vision and language domains, we introduce a Unified Attention-Mamba (UAM) backbone. Unlike previous hybrid approaches that integrate Attention and Mamba modules in fixed proportions, our unified design flexibly combines their capabilities within a single cohesive architecture, eliminating the need for manual ratio tuning and improving encode capability. We develop two UAM variants to comprehensively evaluate the benefits of this unified structure. Building on this backbone, we further propose a multimodal UAM framework that jointly performs cell-level classification and image segmentation. Experimental results demonstrate that UAM achieves state-of-the-art performance across both tasks on public benchmarks, surpassing leading image-based foundation models. It improves cell classification accuracy from 74\% to 78\% ($n$=349,882 cells), and tumor segmentation precision from 75\% to 80\% ($n$=406 patches).