MLLGNov 12, 2025

Branching Flows: Discrete, Continuous, and Manifold Flow Matching with Splits and Deletions

arXiv:2511.09465v12 citationsh-index: 3
Originality Highly original
AI Analysis

This addresses a key limitation in diffusion and flow matching methods for domains like drug discovery where sequence length is not fixed, offering a novel solution for variable-length generative modeling.

The paper tackles the problem of generative modeling for sequences with variable lengths, such as molecules or proteins, by proposing Branching Flows, a framework that uses stochastic branching and deletion on binary trees to control element count during generation. It demonstrates capabilities in small molecule, antibody sequence, and protein backbone generation, showing stable learning and new functionalities.

Diffusion and flow matching approaches to generative modeling have shown promise in domains where the state space is continuous, such as image generation or protein folding & design, and discrete, exemplified by diffusion large language models. They offer a natural fit when the number of elements in a state is fixed in advance (e.g. images), but require ad hoc solutions when, for example, the length of a response from a large language model, or the number of amino acids in a protein chain is not known a priori. Here we propose Branching Flows, a generative modeling framework that, like diffusion and flow matching approaches, transports a simple distribution to the data distribution. But in Branching Flows, the elements in the state evolve over a forest of binary trees, branching and dying stochastically with rates that are learned by the model. This allows the model to control, during generation, the number of elements in the sequence. We also show that Branching Flows can compose with any flow matching base process on discrete sets, continuous Euclidean spaces, smooth manifolds, and `multimodal' product spaces that mix these components. We demonstrate this in three domains: small molecule generation (multimodal), antibody sequence generation (discrete), and protein backbone generation (multimodal), and show that Branching Flows is a capable distribution learner with a stable learning objective, and that it enables new capabilities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes