ASCLSDJun 7, 2020

Parametric Representation for Singing Voice Synthesis: a Comparative Evaluation

arXiv:2006.04142v110 citations
Originality Synthesis-oriented
AI Analysis

It addresses the problem of adapting speech vocoders to singing synthesis for audio researchers, but it is incremental as it evaluates existing methods without introducing new ones.

This paper compared four parametric vocoder techniques for singing voice synthesis across different singer types, finding that high-pitched voices often produce artifacts, and suggested approaches to mitigate these issues.

Various parametric representations have been proposed to model the speech signal. While the performance of such vocoders is well-known in the context of speech processing, their extrapolation to singing voice synthesis might not be straightforward. The goal of this paper is twofold. First, a comparative subjective evaluation is performed across four existing techniques suitable for statistical parametric synthesis: traditional pulse vocoder, Deterministic plus Stochastic Model, Harmonic plus Noise Model and GlottHMM. The behavior of these techniques as a function of the singer type (baritone, counter-tenor and soprano) is studied. Secondly, the artifacts occurring in high-pitched voices are discussed and possible approaches to overcome them are suggested.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes