Mitigating data replication in text-to-audio generative diffusion models through anti-memorization guidance
This addresses a persistent challenge in generative audio models for applications requiring original content, though it appears incremental as it adapts existing guidance techniques to a specific domain.
The paper tackled data replication in text-to-audio diffusion models by applying Anti-Memorization Guidance (AMG) to reduce memorization during inference, achieving significant mitigation without compromising audio quality or semantic alignment.
A persistent challenge in generative audio models is data replication, where the model unintentionally generates parts of its training data during inference. In this work, we address this issue in text-to-audio diffusion models by exploring the use of anti-memorization strategies. We adopt Anti-Memorization Guidance (AMG), a technique that modifies the sampling process of pre-trained diffusion models to discourage memorization. Our study explores three types of guidance within AMG, each designed to reduce replication while preserving generation quality. We use Stable Audio Open as our backbone, leveraging its fully open-source architecture and training dataset. Our comprehensive experimental analysis suggests that AMG significantly mitigates memorization in diffusion-based text-to-audio generation without compromising audio fidelity or semantic alignment.