CVMay 5, 2024

Matten: Video Generation with Mamba-Attention

arXiv:2405.03025v233 citationsh-index: 12
AI Analysis

This addresses video generation for AI and media applications, but it appears incremental as it builds on existing diffusion and attention methods with a hybrid approach.

The paper tackles video generation by introducing Matten, a latent diffusion model with a Mamba-Attention architecture, achieving competitive performance with superior FVD scores and efficiency compared to Transformer-based and GAN-based models.

In this paper, we introduce Matten, a cutting-edge latent diffusion model with Mamba-Attention architecture for video generation. With minimal computational cost, Matten employs spatial-temporal attention for local video content modeling and bidirectional Mamba for global video content modeling. Our comprehensive experimental evaluation demonstrates that Matten has competitive performance with the current Transformer-based and GAN-based models in benchmark performance, achieving superior FVD scores and efficiency. Additionally, we observe a direct positive correlation between the complexity of our designed model and the improvement in video quality, indicating the excellent scalability of Matten.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes