CVMar 18, 2025

Deeply Supervised Flow-Based Generative Models

arXiv:2503.14494v213 citationsh-index: 11
Originality Incremental advance
AI Analysis

This work addresses a bottleneck in training efficiency for flow-based generative models, offering incremental improvements in speed and quality for visual generation tasks.

The paper tackles the underutilization of inter-layer representations in flow-based generative models by introducing DeepFlow, which uses deep supervision and a Velocity Refiner with Acceleration block to align intermediate features, resulting in 8 times faster convergence on ImageNet with equivalent performance and a 2.6 FID reduction while halving training time.

Flow based generative models have charted an impressive path across multiple visual generation tasks by adhering to a simple principle: learning velocity representations of a linear interpolant. However, we observe that training velocity solely from the final layer output underutilizes the rich inter layer representations, potentially impeding model convergence. To address this limitation, we introduce DeepFlow, a novel framework that enhances velocity representation through inter layer communication. DeepFlow partitions transformer layers into balanced branches with deep supervision and inserts a lightweight Velocity Refiner with Acceleration (VeRA) block between adjacent branches, which aligns the intermediate velocity features within transformer blocks. Powered by the improved deep supervision via the internal velocity alignment, DeepFlow converges 8 times faster on ImageNet with equivalent performance and further reduces FID by 2.6 while halving training time compared to previous flow based models without a classifier free guidance. DeepFlow also outperforms baselines in text to image generation tasks, as evidenced by evaluations on MSCOCO and zero shot GenEval.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes