SDASJun 20, 2018

Synthesizing Diverse, High-Quality Audio Textures

arXiv:1806.08002v16 citations
Originality Incremental advance
AI Analysis

This work addresses audio texture synthesis, a domain-specific problem for audio processing and style transfer, with incremental improvements over existing methods.

The paper tackles the challenge of synthesizing diverse, high-quality audio textures by extending image-based Gram matrix techniques to audio, introducing autocorrelation and diversity terms to improve rhythm preservation and uniqueness, and shows a trade-off between diversity and quality with quantitative evaluation using a VGGish loss.

Texture synthesis techniques based on matching the Gram matrix of feature activations in neural networks have achieved spectacular success in the image domain. In this paper we extend these techniques to the audio domain. We demonstrate that synthesizing diverse audio textures is challenging, and argue that this is because audio data is relatively low-dimensional. We therefore introduce two new terms to the original Grammian loss: an autocorrelation term that preserves rhythm, and a diversity term that encourages the optimization procedure to synthesize unique textures. We quantitatively study the impact of our design choices on the quality of the synthesized audio by introducing an audio analogue to the Inception loss which we term the VGGish loss. We show that there is a trade-off between the diversity and quality of the synthesized audio using this technique. We additionally perform a number of experiments to qualitatively study how these design choices impact the quality of the synthesized audio. Finally we describe the implications of these results for the problem of audio style transfer.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes