AS CL SDJun 7, 2020

Parametric Representation for Singing Voice Synthesis: a Comparative Evaluation

Onur Babacan, Thomas Drugman, Tuomo Raitio, Daniel Erro, Thierry Dutoit

arXiv:2006.04142v14.310 citations

Originality Synthesis-oriented

AI Analysis

It addresses the problem of adapting speech vocoders to singing synthesis for audio researchers, but it is incremental as it evaluates existing methods without introducing new ones.

This paper compared four parametric vocoder techniques for singing voice synthesis across different singer types, finding that high-pitched voices often produce artifacts, and suggested approaches to mitigate these issues.

Various parametric representations have been proposed to model the speech signal. While the performance of such vocoders is well-known in the context of speech processing, their extrapolation to singing voice synthesis might not be straightforward. The goal of this paper is twofold. First, a comparative subjective evaluation is performed across four existing techniques suitable for statistical parametric synthesis: traditional pulse vocoder, Deterministic plus Stochastic Model, Harmonic plus Noise Model and GlottHMM. The behavior of these techniques as a function of the singer type (baritone, counter-tenor and soprano) is studied. Secondly, the artifacts occurring in high-pitched voices are discussed and possible approaches to overcome them are suggested.

View on arXiv PDF

Similar