SDAICLASFeb 28, 2025

InspireMusic: Integrating Super Resolution and Large Language Model for High-Fidelity Long-Form Music Generation

arXiv:2503.00084v116 citationsh-index: 18Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of producing controllable, high-quality long-form audio for music generation applications, though it appears incremental as it builds on existing transformer and flow-matching methods.

The authors tackled the problem of generating high-fidelity, long-form music by integrating super-resolution and a large language model, achieving coherent music generation of up to 8 minutes with performance comparable to top-tier systems like MusicGen and Stable Audio 2.0.

We introduce InspireMusic, a framework integrated super resolution and large language model for high-fidelity long-form music generation. A unified framework generates high-fidelity music, songs, and audio, which incorporates an autoregressive transformer with a super-resolution flow-matching model. This framework enables the controllable generation of high-fidelity long-form music at a higher sampling rate from both text and audio prompts. Our model differs from previous approaches, as we utilize an audio tokenizer with one codebook that contains richer semantic information, thereby reducing training costs and enhancing efficiency. This combination enables us to achieve high-quality audio generation with long-form coherence of up to $8$ minutes. Then, an autoregressive transformer model based on Qwen 2.5 predicts audio tokens. Next, we employ a super-resolution flow-matching model to generate high-sampling rate audio with fine-grained details learned from an acoustic codec model. Comprehensive experiments show that the InspireMusic-1.5B-Long model has a comparable performance to recent top-tier open-source systems, including MusicGen and Stable Audio 2.0, on subjective and objective evaluations. The code and pre-trained models are released at https://github.com/FunAudioLLM/InspireMusic.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes