ASSDMar 4, 2019

Analysing Deep Learning-Spectral Envelope Prediction Methods for Singing Synthesis

arXiv:1903.01161v13 citations
Originality Incremental advance
AI Analysis

This work addresses incremental improvements in singing synthesis quality for audio generation applications.

The paper tackled spectral envelope prediction for singing synthesis by investigating neural network hyper-parameters, finding that 2D convolutions outperform 1D convolutions and iterative multi-frame prediction is better than noise injection, with the proposed architecture showing superior results over state-of-the-art methods.

We conduct an investigation on various hyper-parameters regarding neural networks used to generate spectral envelopes for singing synthesis. Two perceptive tests, where the first compares two models directly and the other ranks models with a mean opinion score, are performed. With these tests we show that when learning to predict spectral envelopes, 2d-convolutions are superior over previously proposed 1d-convolutions and that predicting multiple frames in an iterated fashion during training is superior over injecting noise to the input data. An experimental investigation whether learning to predict a probability distribution vs.\ single samples was performed but turned out to be inconclusive. A network architecture is proposed that incorporates the improvements which we found to be useful and we show in our experiments that this network produces better results than other stat-of-the-art methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes