SDAIMMASDec 14, 2024

Hidden Echoes Survive Training in Audio To Audio Generative Instrument Models

arXiv:2412.10649v1h-index: 12
Originality Incremental advance
AI Analysis

This work addresses the need for tagging generative audio models to ensure proper data licensing and elucidate black-box behavior, offering an incremental improvement using classical watermarking techniques.

The paper tackled the problem of tracing training data usage in generative audio models by demonstrating that imperceptible echoes hidden in training data are reproduced in outputs across various architectures, with a single echo being robust and longer patterns increasing information capacity.

As generative techniques pervade the audio domain, there has been increasing interest in tracing back through these complicated models to understand how they draw on their training data to synthesize new examples, both to ensure that they use properly licensed data and also to elucidate their black box behavior. In this paper, we show that if imperceptible echoes are hidden in the training data, a wide variety of audio to audio architectures (differentiable digital signal processing (DDSP), Realtime Audio Variational autoEncoder (RAVE), and ``Dance Diffusion'') will reproduce these echoes in their outputs. Hiding a single echo is particularly robust across all architectures, but we also show promising results hiding longer time spread echo patterns for an increased information capacity. We conclude by showing that echoes make their way into fine tuned models, that they survive mixing/demixing, and that they survive pitch shift augmentation during training. Hence, this simple, classical idea in watermarking shows significant promise for tagging generative audio models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes