Convergence Analysis of Flow Matching in Latent Space with Transformers
This offers theoretical foundations for ODE-based generative models, which is incremental but important for researchers in generative AI.
The paper provides theoretical convergence guarantees for flow matching generative models by using a pre-trained autoencoder to map data to a latent space where a transformer predicts velocity fields, showing that generated samples converge to the target distribution in Wasserstein-2 distance under practical assumptions.
We present theoretical convergence guarantees for ODE-based generative models, specifically flow matching. We use a pre-trained autoencoder network to map high-dimensional original inputs to a low-dimensional latent space, where a transformer network is trained to predict the velocity field of the transformation from a standard normal distribution to the target latent distribution. Our error analysis demonstrates the effectiveness of this approach, showing that the distribution of samples generated via estimated ODE flow converges to the target distribution in the Wasserstein-2 distance under mild and practical assumptions. Furthermore, we show that arbitrary smooth functions can be effectively approximated by transformer networks with Lipschitz continuity, which may be of independent interest.