SDLGASAug 4, 2020

Timbre latent space: exploration and creative aspects

arXiv:2008.01370v23 citations
AI Analysis

This work addresses the problem of enhancing creative control in audio synthesis for musicians and sound designers, though it is incremental as it builds on existing disentanglement methods.

The paper tackles the limited control over timbre in unsupervised audio models by using disentangled representations in Variational Auto-Encoders with perceptual regularization, enabling continuous inference and synthesis aligned with multi-dimensional timbre spaces. It explores creative applications through experiments with composers using custom interfaces for latent sound synthesis.

Recent studies show the ability of unsupervised models to learn invertible audio representations using Auto-Encoders. They enable high-quality sound synthesis but a limited control since the latent spaces do not disentangle timbre properties. The emergence of disentangled representations was studied in Variational Auto-Encoders (VAEs), and has been applied to audio. Using an additional perceptual regularization can align such latent representation with the previously established multi-dimensional timbre spaces, while allowing continuous inference and synthesis. Alternatively, some specific sound attributes can be learned as control variables while unsupervised dimensions account for the remaining features. New possibilities for timbre manipulations are enabled with generative neural networks, although the exploration and the creative use of their representations remain little. The following experiments are led in cooperation with two composers and propose new creative directions to explore latent sound synthesis of musical timbres, using specifically designed interfaces (Max/MSP, Pure Data) or mappings for descriptor-based synthesis.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes