SDAISep 17, 2025

RFM-Editing: Rectified Flow Matching for Text-guided Audio Editing

arXiv:2509.14003v12 citationsh-index: 12
Originality Incremental advance
AI Analysis

This addresses the problem of precise and efficient audio editing for users in multimedia or AI applications, representing an incremental improvement over existing methods.

The paper tackles text-guided audio editing by proposing an end-to-end rectified flow matching diffusion framework, achieving faithful semantic alignment without auxiliary captions or masks and maintaining competitive editing quality in experiments.

Diffusion models have shown remarkable progress in text-to-audio generation. However, text-guided audio editing remains in its early stages. This task focuses on modifying the target content within an audio signal while preserving the rest, thus demanding precise localization and faithful editing according to the text prompt. Existing training-based and zero-shot methods that rely on full-caption or costly optimization often struggle with complex editing or lack practicality. In this work, we propose a novel end-to-end efficient rectified flow matching-based diffusion framework for audio editing, and construct a dataset featuring overlapping multi-event audio to support training and benchmarking in complex scenarios. Experiments show that our model achieves faithful semantic alignment without requiring auxiliary captions or masks, while maintaining competitive editing quality across metrics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes