Parameter-Efficient Transfer Learning for Music Foundation Models
This work addresses computational efficiency and overfitting issues in transfer learning for music foundation models, though it is incremental as it adapts existing PETL techniques to the music domain.
The paper tackles the challenge of adapting music foundation models to downstream tasks by proposing parameter-efficient transfer learning (PETL) methods, which outperform probing and fine-tuning on music auto-tagging and achieve similar results as fine-tuning on key detection and tempo estimation with significantly less training cost.
More music foundation models are recently being released, promising a general, mostly task independent encoding of musical information. Common ways of adapting music foundation models to downstream tasks are probing and fine-tuning. These common transfer learning approaches, however, face challenges. Probing might lead to suboptimal performance because the pre-trained weights are frozen, while fine-tuning is computationally expensive and is prone to overfitting. Our work investigates the use of parameter-efficient transfer learning (PETL) for music foundation models which integrates the advantage of probing and fine-tuning. We introduce three types of PETL methods: adapter-based methods, prompt-based methods, and reparameterization-based methods. These methods train only a small number of parameters, and therefore do not require significant computational resources. Results show that PETL methods outperform both probing and fine-tuning on music auto-tagging. On key detection and tempo estimation, they achieve similar results as fine-tuning with significantly less training cost. However, the usefulness of the current generation of foundation model on key and tempo tasks is questioned by the similar results achieved by training a small model from scratch. Code available at https://github.com/suncerock/peft-music/