CVMay 23, 2024

DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis

arXiv:2405.14224v284 citationsh-index: 26Has Code
Originality Incremental advance
AI Analysis

This addresses efficiency challenges in image generation for AI researchers and practitioners, but it is incremental as it builds on existing diffusion and Mamba methods.

The paper tackles the high computational cost of Transformers in diffusion models for high-resolution image synthesis by proposing Diffusion Mamba (DiM), which combines Mamba's efficiency with diffusion models, achieving efficient inference and training strategies for resolutions up to 1536x1536.

Diffusion models have achieved great success in image generation, with the backbone evolving from U-Net to Vision Transformers. However, the computational cost of Transformers is quadratic to the number of tokens, leading to significant challenges when dealing with high-resolution images. In this work, we propose Diffusion Mamba (DiM), which combines the efficiency of Mamba, a sequence model based on State Space Models (SSM), with the expressive power of diffusion models for efficient high-resolution image synthesis. To address the challenge that Mamba cannot generalize to 2D signals, we make several architecture designs including multi-directional scans, learnable padding tokens at the end of each row and column, and lightweight local feature enhancement. Our DiM architecture achieves inference-time efficiency for high-resolution images. In addition, to further improve training efficiency for high-resolution image generation with DiM, we investigate "weak-to-strong" training strategy that pretrains DiM on low-resolution images ($256\times 256$) and then finetune it on high-resolution images ($512 \times 512$). We further explore training-free upsampling strategies to enable the model to generate higher-resolution images (e.g., $1024\times 1024$ and $1536\times 1536$) without further fine-tuning. Experiments demonstrate the effectiveness and efficiency of our DiM. The code of our work is available here: {\url{https://github.com/tyshiwo1/DiM-DiffusionMamba/}}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes