SDLGASFeb 15, 2024

Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion

arXiv:2402.10009v466 citationsh-index: 33ICML
Originality Incremental advance
AI Analysis

This work addresses the lack of zero-shot editing methods in audio, offering tools for musicians and audio engineers, though it is incremental as it adapts existing image-domain techniques.

The paper tackles zero-shot audio editing by adapting DDPM inversion techniques from images, introducing ZETA for text-based editing and ZEUS for unsupervised discovery of editing directions, with results demonstrating musically interesting modifications like instrument control and melody improvisations.

Editing signals using large pre-trained models, in a zero-shot manner, has recently seen rapid advancements in the image domain. However, this wave has yet to reach the audio domain. In this paper, we explore two zero-shot editing techniques for audio signals, which use DDPM inversion with pre-trained diffusion models. The first, which we coin ZEro-shot Text-based Audio (ZETA) editing, is adopted from the image domain. The second, named ZEro-shot UnSupervized (ZEUS) editing, is a novel approach for discovering semantically meaningful editing directions without supervision. When applied to music signals, this method exposes a range of musically interesting modifications, from controlling the participation of specific instruments to improvisations on the melody. Samples and code can be found in https://hilamanor.github.io/AudioEditing/ .

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes