CV AIMar 13, 2025

OuroMamba: A Data-Free Quantization Framework for Vision Mamba Models

Akshat Ramachandran, Mingyu Lee, Huan Xu, Souvik Kundu, Tushar Krishna

arXiv:2503.10959v113.16 citationsh-index: 7

Originality Highly original

AI Analysis

This enables efficient deployment of vision Mamba models without real data, which is incremental but important for resource-constrained applications.

The paper tackles the problem of data-free post-training quantization for vision Mamba models by addressing challenges like weak synthetic data and dynamic outliers, achieving state-of-the-art performance and up to 2.36x latency speedup.

We present OuroMamba, the first data-free post-training quantization (DFQ) method for vision Mamba-based models (VMMs). We identify two key challenges in enabling DFQ for VMMs, (1) VMM's recurrent state transitions restricts capturing of long-range interactions and leads to semantically weak synthetic data, (2) VMM activations exhibit dynamic outlier variations across time-steps, rendering existing static PTQ techniques ineffective. To address these challenges, OuroMamba presents a two-stage framework: (1) OuroMamba-Gen to generate semantically rich and meaningful synthetic data. It applies contrastive learning on patch level VMM features generated through neighborhood interactions in the latent state space, (2) OuroMamba-Quant to employ mixed-precision quantization with lightweight dynamic outlier detection during inference. In specific, we present a thresholding based outlier channel selection strategy for activations that gets updated every time-step. Extensive experiments across vision and generative tasks show that our data-free OuroMamba surpasses existing data-driven PTQ techniques, achieving state-of-the-art performance across diverse quantization settings. Additionally, we implement efficient GPU kernels to achieve practical latency speedup of up to 2.36x. Code will be released soon.

View on arXiv PDF

Similar