SDLGASJul 7, 2025

Music Boomerang: Reusing Diffusion Models for Data Augmentation and Audio Manipulation

arXiv:2507.04864v1h-index: 7
Originality Synthesis-oriented
AI Analysis

This addresses data scarcity in audio tasks like beat tracking and content manipulation, but is incremental as it adapts an existing method to a new domain.

The paper applied Boomerang sampling, a method from image generation, to audio using Stable Audio Open for data augmentation and instrument replacement, showing it preserves rhythmic structure, improves beat tracker performance in low-data scenarios, and enables text-based instrument replacement on monophonic inputs.

Generative models of music audio are typically used to generate output based solely on a text prompt or melody. Boomerang sampling, recently proposed for the image domain, allows generating output close to an existing example, using any pretrained diffusion model. In this work, we explore its application in the audio domain as a tool for data augmentation or content manipulation. Specifically, implementing Boomerang sampling for Stable Audio Open, we augment training data for a state-of-the-art beat tracker, and attempt to replace musical instruments in recordings. Our results show that the rhythmic structure of existing examples is mostly preserved, that it improves performance of the beat tracker, but only in scenarios of limited training data, and that it can accomplish text-based instrument replacement on monophonic inputs. We publish our implementation to invite experiments on data augmentation in other tasks and explore further applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes