CVApr 30

Representation Fréchet Loss for Visual Generation

arXiv:2604.2819090.13 citations
AI Analysis

For generative model practitioners, this work provides a practical training objective that improves visual quality and reveals limitations of FID as an evaluation metric.

The authors show that Fréchet Distance can be effectively optimized as a training objective by decoupling population size from batch size, achieving 0.72 FID on ImageNet 256x256 with a one-step generator and enabling multi-step generators to be repurposed into strong one-step generators without distillation or adversarial training.

We show that Fréchet Distance (FD), long considered impractical as a training objective, can in fact be effectively optimized in the representation space. Our idea is simple: decouple the population size for FD estimation (e.g., 50k) from the batch size for gradient computation (e.g., 1024). We term this approach FD-loss. Optimizing FD-loss reveals several surprising findings. First, post-training a base generator with FD-loss in different representation spaces consistently improves visual quality. Under the Inception feature space, a one-step generator achieves0.72 FID on ImageNet 256x256. Second, the same FD-loss repurposes multi-step generators into strong one-step generators without teacher distillation, adversarial training or per-sample targets. Third, FID can misrank visual quality: modern representations can yield better samples despite worse Inception FID. This motivates FDr$^k$, a multi-representation metric. We hope this work will encourage further exploration of distributional distances in diverse representation spaces as both training objectives and evaluation metrics for generative models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes