SDLGASJan 21, 2025

Audio Texture Manipulation by Exemplar-Based Analogy

arXiv:2501.12385v12 citationsh-index: 25ICASSP
Originality Incremental advance
AI Analysis

This addresses the problem of precise audio editing for sound designers and researchers, offering a novel approach but is incremental in applying analogy methods to audio.

The paper tackles audio texture manipulation by proposing an exemplar-based analogy model that uses paired speech examples to learn transformations, outperforming text-conditioned baselines in quantitative evaluations and perceptual studies.

Audio texture manipulation involves modifying the perceptual characteristics of a sound to achieve specific transformations, such as adding, removing, or replacing auditory elements. In this paper, we propose an exemplar-based analogy model for audio texture manipulation. Instead of conditioning on text-based instructions, our method uses paired speech examples, where one clip represents the original sound and another illustrates the desired transformation. The model learns to apply the same transformation to new input, allowing for the manipulation of sound textures. We construct a quadruplet dataset representing various editing tasks, and train a latent diffusion model in a self-supervised manner. We show through quantitative evaluations and perceptual studies that our model outperforms text-conditioned baselines and generalizes to real-world, out-of-distribution, and non-speech scenarios. Project page: https://berkeley-speech-group.github.io/audio-texture-analogy/

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes