LGAISDApr 5, 2017

Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders

arXiv:1704.01279v1756 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of high-quality audio synthesis for music applications, offering incremental improvements in generative modeling for audio.

The paper tackles the problem of generating realistic musical audio by introducing a WaveNet-style autoencoder model and a large-scale dataset called NSynth, resulting in improved performance over a baseline and the ability to create new sounds through timbre interpolation.

Generative models in vision have seen rapid progress due to algorithmic improvements and the availability of high-quality image datasets. In this paper, we offer contributions in both these areas to enable similar progress in audio modeling. First, we detail a powerful new WaveNet-style autoencoder model that conditions an autoregressive decoder on temporal codes learned from the raw audio waveform. Second, we introduce NSynth, a large-scale and high-quality dataset of musical notes that is an order of magnitude larger than comparable public datasets. Using NSynth, we demonstrate improved qualitative and quantitative performance of the WaveNet autoencoder over a well-tuned spectral autoencoder baseline. Finally, we show that the model learns a manifold of embeddings that allows for morphing between instruments, meaningfully interpolating in timbre to create new types of sounds that are realistic and expressive.

Code Implementations8 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes