CVJul 6, 2025

MambaVideo for Discrete Video Tokenization with Channel-Split Quantization

Dawit Mureja Argaw, Xian Liu, Joon Son Chung, Ming-Yu Liu, Fitsum Reda

arXiv:2507.04559v110.24 citationsh-index: 10

Originality Highly original

AI Analysis

This addresses the challenge of high-dimensional video data processing for researchers and practitioners in video generation, though it appears incremental in advancing tokenization methods.

The paper tackles the problem of discrete video tokenization for efficient autoregressive generative modeling by introducing a Mamba-based encoder-decoder architecture and a channel-split quantization scheme. The result is a state-of-the-art model that outperforms existing approaches across multiple datasets.

Discrete video tokenization is essential for efficient autoregressive generative modeling due to the high dimensionality of video data. This work introduces a state-of-the-art discrete video tokenizer with two key contributions. First, we propose a novel Mamba-based encoder-decoder architecture that overcomes the limitations of previous sequencebased tokenizers. Second, we introduce a new quantization scheme, channel-split quantization, which significantly enhances the representational power of quantized latents while preserving the token count. Our model sets a new state-of-the-art, outperforming both causal 3D convolutionbased and Transformer-based approaches across multiple datasets. Experimental results further demonstrate its robustness as a tokenizer for autoregressive video generation.

View on arXiv PDF

Similar