Edit Flows: Flow Matching with Edit Operations
This addresses a bottleneck in non-autoregressive generative models for sequence data, offering a more flexible approach for tasks like image captioning and code generation, though it appears incremental as it builds on existing flow matching and edit operation concepts.
The paper tackled the problem of non-autoregressive models struggling with variable-length sequence generation by proposing Edit Flows, a model using edit operations within a Continuous-time Markov Chain, which outperformed autoregressive and mask models in image captioning and significantly beat mask construction in text and code generation.
Autoregressive generative models naturally generate variable-length sequences, while non-autoregressive models struggle, often imposing rigid, token-wise structures. We propose Edit Flows, a non-autoregressive model that overcomes these limitations by defining a discrete flow over sequences through edit operations$\unicode{x2013}$insertions, deletions, and substitutions. By modeling these operations within a Continuous-time Markov Chain over the sequence space, Edit Flows enable flexible, position-relative generation that aligns more closely with the structure of sequence data. Our training method leverages an expanded state space with auxiliary variables, making the learning process efficient and tractable. Empirical results show that Edit Flows outperforms both autoregressive and mask models on image captioning and significantly outperforms the mask construction in text and code generation.