LG MLApr 14, 2025

$α$-Flow: A Unified Framework for Continuous-State Discrete Flow Matching Models

Chaoran Cheng, Jiahan Li, Jiajun Fan, Ge Liu

arXiv:2504.10283v117.912 citationsh-index: 5

Originality Incremental advance

AI Analysis

This work provides a unified framework for researchers in generative modeling, offering incremental improvements in performance for specific domains like image and protein sequence generation.

The authors tackled the problem of unifying and improving continuous-state discrete flow matching models for discrete generative modeling by introducing the $\alpha$-Flow framework, which outperforms discrete-state counterparts in image and protein sequence generation and better captures entropy in language modeling.

Recent efforts have extended the flow-matching framework to discrete generative modeling. One strand of models directly works with the continuous probabilities instead of discrete tokens, which we colloquially refer to as Continuous-State Discrete Flow Matching (CS-DFM). Existing CS-DFM models differ significantly in their representations and geometric assumptions. This work presents a unified framework for CS-DFM models, under which the existing variants can be understood as operating on different $α$-representations of probabilities. Building upon the theory of information geometry, we introduce $α$-Flow, a family of CS-DFM models that adheres to the canonical $α$-geometry of the statistical manifold, and demonstrate its optimality in minimizing the generalized kinetic energy. Theoretically, we show that the flow matching loss for $α$-flow establishes a unified variational bound for the discrete negative log-likelihood. We comprehensively evaluate different instantiations of $α$-flow on various discrete generation domains to demonstrate their effectiveness in discrete generative modeling, including intermediate values whose geometries have never been explored before. $α$-flow significantly outperforms its discrete-state counterpart in image and protein sequence generation and better captures the entropy in language modeling.

View on arXiv PDF

Similar