LGAIApr 14, 2025

Efficient Generative Model Training via Embedded Representation Warmup

arXiv:2504.10188v32 citationsh-index: 3Has Code
Originality Highly original
AI Analysis

This addresses the problem of slow and complex training for generative models, offering a more efficient approach for researchers and practitioners.

The paper tackles the challenge of inefficient generative model training by decoupling semantic learning from synthesis details, achieving an 11.5x speedup to reach FID=1.41 in 350 epochs.

Generative models face a fundamental challenge: they must simultaneously learn high-level semantic concepts (what to generate) and low-level synthesis details (how to generate it). Conventional end-to-end training entangles these distinct, and often conflicting objectives, leading to a complex and inefficient optimization process. We argue that explicitly decoupling these tasks is key to unlocking more effective and efficient generative modeling. To this end, we propose Embedded Representation Warmup (ERW), a principled two-phase training framework. The first phase is dedicated to building a robust semantic foundation by aligning the early layers of a diffusion model with a powerful pretrained encoder. This provides a strong representational prior, allowing the second phase -- generative full training with alignment loss to refine the representation -- to focus its resources on high-fidelity synthesis. Our analysis confirms that this efficacy stems from functionally specializing the model's early layers for representation. Empirically, our framework achieves a 11.5$\times$ speedup in 350 epochs to reach FID=1.41 compared to single-phase methods like REPA. Code is available at https://github.com/LINs-lab/ERW.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes