CVSep 29, 2025

Diffusion Bridge or Flow Matching? A Unifying Framework and Comparative Analysis

Kaizhen Zhu, Mokai Pan, Zhechuan Yu, Jingya Wang, Jingyi Yu, Ye Shi

arXiv:2509.24531v110.23 citationsh-index: 15Has Code

Originality Highly original

AI Analysis

This work clarifies the relative merits of two popular generative models for researchers and practitioners in machine learning, offering guidance on model selection based on theoretical insights and empirical evidence.

The paper tackled the confusion between Diffusion Bridge and Flow Matching models by providing a unified theoretical and experimental validation, showing that Diffusion Bridge has a lower cost function for more stable trajectories and Flow Matching becomes less effective with reduced training data, with comprehensive experiments across six tasks confirming these predictions.

Diffusion Bridge and Flow Matching have both demonstrated compelling empirical performance in transformation between arbitrary distributions. However, there remains confusion about which approach is generally preferable, and the substantial discrepancies in their modeling assumptions and practical implementations have hindered a unified theoretical account of their relative merits. We have, for the first time, provided a unified theoretical and experimental validation of these two models. We recast their frameworks through the lens of Stochastic Optimal Control and prove that the cost function of the Diffusion Bridge is lower, guiding the system toward more stable and natural trajectories. Simultaneously, from the perspective of Optimal Transport, interpolation coefficients $t$ and $1-t$ of Flow Matching become increasingly ineffective when the training data size is reduced. To corroborate these theoretical claims, we propose a novel, powerful architecture for Diffusion Bridge built on a latent Transformer, and implement a Flow Matching model with the same structure to enable a fair performance comparison in various experiments. Comprehensive experiments are conducted across Image Inpainting, Super-Resolution, Deblurring, Denoising, Translation, and Style Transfer tasks, systematically varying both the distributional discrepancy (different difficulty) and the training data size. Extensive empirical results align perfectly with our theoretical predictions and allow us to delineate the respective advantages and disadvantages of these two models. Our code is available at https://anonymous.4open.science/r/DBFM-3E8E/.

View on arXiv PDF

Similar