SDAIFeb 18, 2025

SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation

arXiv:2502.13128v239 citationsh-index: 32Has CodeICML
Originality Incremental advance
AI Analysis

This addresses the problem of cumbersome and error-prone song generation for musicians and AI researchers, offering an incremental improvement with a unified model.

The paper tackles text-to-song generation by proposing SongGen, a single-stage autoregressive transformer that generates vocals and accompaniment from text inputs, achieving improved quality and controllability compared to multi-stage methods.

Text-to-song generation, the task of creating vocals and accompaniment from textual inputs, poses significant challenges due to domain complexity and data scarcity. Existing approaches often employ multi-stage generation procedures, leading to cumbersome training and inference pipelines, as well as suboptimal overall generation quality due to error accumulation across stages. In this paper, we propose SongGen, a fully open-source, single-stage auto-regressive transformer designed for controllable song generation. The proposed model facilitates fine-grained control over diverse musical attributes, including lyrics and textual descriptions of instrumentation, genre, mood, and timbre, while also offering an optional three-second reference clip for voice cloning. Within a unified auto-regressive framework, SongGen supports two output modes: mixed mode, which generates a mixture of vocals and accompaniment directly, and dual-track mode, which synthesizes them separately for greater flexibility in downstream applications. We explore diverse token pattern strategies for each mode, leading to notable improvements and valuable insights. Furthermore, we design an automated data preprocessing pipeline with effective quality control. To foster community engagement and future research, we will release our model weights, training code, annotated data, and preprocessing pipeline. The code is available at https://github.com/LiuZH-19/SongGen.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes