SDAILGMMASNov 6, 2025

MusRec: Zero-Shot Text-to-Music Editing via Rectified Flow and Diffusion Transformers

arXiv:2511.04376v3
Originality Incremental advance
AI Analysis

This addresses the need for practical, controllable music editing in applications like video game and film production, though it appears incremental as it builds on rectified flow and diffusion transformers.

The paper tackles the problem of zero-shot text-to-music editing for real-world music, overcoming limitations like model-specific restrictions and precise prompt requirements. The result shows that MusRec outperforms existing methods in preserving musical content, structural consistency, and editing fidelity.

Music editing has emerged as an important and practical area of artificial intelligence, with applications ranging from video game and film music production to personalizing existing tracks according to user preferences. However, existing models face significant limitations, such as being restricted to editing synthesized music generated by their own models, requiring highly precise prompts, or necessitating task-specific retraining, thus lacking true zero-shot capability. leveraging recent advances in rectified flow and diffusion transformers, we introduce MusRec, a zero-shot text-to-music editing model capable of performing diverse editing tasks on real-world music efficiently and effectively. Experimental results demonstrate that our approach outperforms existing methods in preserving musical content, structural consistency, and editing fidelity, establishing a strong foundation for controllable music editing in real-world scenarios.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes