LG CVFeb 25, 2025

VCT: Training Consistency Models with Variational Noise Coupling

Gianluigi Silvestri, Luca Ambrogioni, Chieh-Hsin Lai, Yuhta Takida, Yuki Mitsufuji

arXiv:2502.18197v222.014 citationsh-index: 17Has CodeICML

Originality Incremental advance

AI Analysis

This addresses a key training stability problem for researchers and practitioners in generative modeling, offering an incremental improvement over existing CT methods.

The paper tackles the high variance and instability in non-distillation Consistency Training (CT) for image generation by proposing Variational Consistency Training (VCT), which uses a learned noise-data coupling scheme to adaptively pair noise with data, resulting in state-of-the-art FID on CIFAR-10 and matching SoTA on ImageNet 64x64 with only two sampling steps.

Consistency Training (CT) has recently emerged as a strong alternative to diffusion models for image generation. However, non-distillation CT often suffers from high variance and instability, motivating ongoing research into its training dynamics. We propose Variational Consistency Training (VCT), a flexible and effective framework compatible with various forward kernels, including those in flow matching. Its key innovation is a learned noise-data coupling scheme inspired by Variational Autoencoders, where a data-dependent encoder models noise emission. This enables VCT to adaptively learn noise-todata pairings, reducing training variance relative to the fixed, unsorted pairings in classical CT. Experiments on multiple image datasets demonstrate significant improvements: our method surpasses baselines, achieves state-of-the-art FID among non-distillation CT approaches on CIFAR-10, and matches SoTA performance on ImageNet 64 x 64 with only two sampling steps. Code is available at https://github.com/sony/vct.

View on arXiv PDF Code

Similar