SD AI ASFeb 14, 2024

Arrange, Inpaint, and Refine: Steerable Long-term Music Audio Generation and Editing via Content-based Controls

Liwei Lin, Gus Xia, Yixiao Zhang, Junyan Jiang

ByteDance

arXiv:2402.09508v314.822 citationsh-index: 24Has CodeIJCAI

Originality Incremental advance

AI Analysis

This work addresses the need for more versatile AI tools in human-AI music co-creation, though it is incremental as it builds on existing models like MusicGen.

The paper tackles the limitation of autoregressive large language models in music editing by introducing a parameter-efficient heterogeneous adapter with masking training, enabling tasks like inpainting, arrangement, and refinement. The method, applied to MusicGen, shows promising results in flexible music editing controls.

Controllable music generation plays a vital role in human-AI music co-creation. While Large Language Models (LLMs) have shown promise in generating high-quality music, their focus on autoregressive generation limits their utility in music editing tasks. To address this gap, we propose a novel approach leveraging a parameter-efficient heterogeneous adapter combined with a masking training scheme. This approach enables autoregressive language models to seamlessly address music inpainting tasks. Additionally, our method integrates frame-level content-based controls, facilitating track-conditioned music refinement and score-conditioned music arrangement. We apply this method to fine-tune MusicGen, a leading autoregressive music generation model. Our experiments demonstrate promising results across multiple music editing tasks, offering more flexible controls for future AI-driven music editing tools. The source codes and a demo page showcasing our work are available at https://kikyo-16.github.io/AIR.

View on arXiv PDF Code

Similar