CVAISep 16, 2025

Runge-Kutta Approximation and Decoupled Attention for Rectified Flow Inversion and Semantic Editing

arXiv:2509.12888v1Has Code
Originality Incremental advance
AI Analysis

This addresses practical challenges in image generation and editing for AI applications, representing an incremental improvement over existing rectified flow models.

The paper tackles low inversion accuracy and entangled multimodal attention in rectified flow models by proposing a Runge-Kutta-based inversion method and Decoupled Diffusion Transformer Attention, achieving state-of-the-art performance in image reconstruction and text-guided editing tasks.

Rectified flow (RF) models have recently demonstrated superior generative performance compared to DDIM-based diffusion models. However, in real-world applications, they suffer from two major challenges: (1) low inversion accuracy that hinders the consistency with the source image, and (2) entangled multimodal attention in diffusion transformers, which hinders precise attention control. To address the first challenge, we propose an efficient high-order inversion method for rectified flow models based on the Runge-Kutta solver of differential equations. To tackle the second challenge, we introduce Decoupled Diffusion Transformer Attention (DDTA), a novel mechanism that disentangles text and image attention inside the multimodal diffusion transformers, enabling more precise semantic control. Extensive experiments on image reconstruction and text-guided editing tasks demonstrate that our method achieves state-of-the-art performance in terms of fidelity and editability. Code is available at https://github.com/wmchen/RKSovler_DDTA.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes