CVJan 3, 2024

aMUSEd: An Open MUSE Reproduction

arXiv:2401.01808v129 citationsh-index: 15Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses the need for efficient and interpretable text-to-image generation models, though it is incremental as it reproduces and adapts the existing MUSE approach.

The authors tackled the problem of text-to-image generation by developing aMUSEd, an open-source masked image model (MIM) based on MUSE, which achieves fast image generation with only 10% of MUSE's parameters and directly produces images at 256x256 and 512x512 resolutions.

We present aMUSEd, an open-source, lightweight masked image model (MIM) for text-to-image generation based on MUSE. With 10 percent of MUSE's parameters, aMUSEd is focused on fast image generation. We believe MIM is under-explored compared to latent diffusion, the prevailing approach for text-to-image generation. Compared to latent diffusion, MIM requires fewer inference steps and is more interpretable. Additionally, MIM can be fine-tuned to learn additional styles with only a single image. We hope to encourage further exploration of MIM by demonstrating its effectiveness on large-scale text-to-image generation and releasing reproducible training code. We also release checkpoints for two models which directly produce images at 256x256 and 512x512 resolutions.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes