SDAIASSep 30, 2024

Melody-Guided Music Generation

arXiv:2409.20196v44 citationsh-index: 11Has Code
Originality Highly original
AI Analysis

This work provides a more efficient method for text-to-music generation, which could benefit creators and developers with limited computational resources.

The Melody-Guided Music Generation (MG2) model tackles text-to-music generation by using melody as a guide. It outperforms current open-source text-to-music models while using less than 1/3 of the parameters or less than 1/200 of the training data.

We present the Melody-Guided Music Generation (MG2) model, a novel approach using melody to guide the text-to-music generation that, despite a simple method and limited resources, achieves excellent performance. Specifically, we first align the text with audio waveforms and their associated melodies using the newly proposed Contrastive Language-Music Pretraining, enabling the learned text representation fused with implicit melody information. Subsequently, we condition the retrieval-augmented diffusion module on both text prompt and retrieved melody. This allows MG2 to generate music that reflects the content of the given text description, meantime keeping the intrinsic harmony under the guidance of explicit melody information. We conducted extensive experiments on two public datasets: MusicCaps and MusicBench. Surprisingly, the experimental results demonstrate that the proposed MG2 model surpasses current open-source text-to-music generation models, achieving this with fewer than 1/3 of the parameters or less than 1/200 of the training data compared to state-of-the-art counterparts. Furthermore, we conducted comprehensive human evaluations involving three types of users and five perspectives, using newly designed questionnaires to explore the potential real-world applications of MG2.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes