JEN-1 DreamStyler: Customized Musical Concept Learning via Pivotal Parameters Tuning
This addresses the need for customized music generation for users who want to incorporate specific audio concepts, representing an incremental advance in text-to-music models.
The paper tackles the problem of generating music that captures specific concepts from reference audio, which text prompts alone cannot precisely convey, by proposing a method that fine-tunes a pretrained text-to-music model with a pivotal parameters tuning approach to avoid overfitting and handle multiple concepts, resulting in outperforming baselines in evaluations.
Large models for text-to-music generation have achieved significant progress, facilitating the creation of high-quality and varied musical compositions from provided text prompts. However, input text prompts may not precisely capture user requirements, particularly when the objective is to generate music that embodies a specific concept derived from a designated reference collection. In this paper, we propose a novel method for customized text-to-music generation, which can capture the concept from a two-minute reference music and generate a new piece of music conforming to the concept. We achieve this by fine-tuning a pretrained text-to-music model using the reference music. However, directly fine-tuning all parameters leads to overfitting issues. To address this problem, we propose a Pivotal Parameters Tuning method that enables the model to assimilate the new concept while preserving its original generative capabilities. Additionally, we identify a potential concept conflict when introducing multiple concepts into the pretrained model. We present a concept enhancement strategy to distinguish multiple concepts, enabling the fine-tuned model to generate music incorporating either individual or multiple concepts simultaneously. Since we are the first to work on the customized music generation task, we also introduce a new dataset and evaluation protocol for the new task. Our proposed Jen1-DreamStyler outperforms several baselines in both qualitative and quantitative evaluations. Demos will be available at https://www.jenmusic.ai/research#DreamStyler.