Mask-aware inference with State-Space Models
This work provides a mechanism for State Space Models to robustly handle missing data, which is crucial for real-world computer vision applications where inputs are often incomplete.
This paper addresses the challenge of handling arbitrarily shaped missing data in computer vision tasks when using State Space Models (SSMs). It introduces Partial Vision Mamba (PVM), which adapts the mask-aware re-normalization principles of Partial Convolutions to the Mamba architecture. The method demonstrates efficacy across depth completion, image inpainting, and classification with invalid data.
Many real-world computer vision tasks, such as depth completion, must handle inputs with arbitrarily shaped regions of missing or invalid data. For Convolutional Neural Networks (CNNs), Partial Convolutions solved this by a mask-aware re-normalization conditioned only on valid pixels. Recently, State Space Models (SSMs) like Mamba have emerged, offering high performance with linear complexity. However, these architectures lack an inherent mechanism for handling such arbitrarily shaped invalid data at inference time. To bridge this gap, we introduce Partial Vision Mamba (PVM), a novel architectural component that ports the principles of partial operations to the Mamba backbone. We also define a series of rules to design architectures using PVM. We show the efficacy and generalizability of our approach in the tasks of depth completion, image inpainting, and classification with invalid data.