AS SDMay 20, 2020

Evaluating Features and Metrics for High-Quality Simulation of Early Vocal Learning of Vowels

Branislav Gerazov, Daniel van Niekerk, Anqi Xu, Paul Konstantin Krug, Peter Birkholz, Yi Xu

arXiv:2005.09986v21.21 citations

Originality Synthesis-oriented

AI Analysis

This work addresses a domain-specific challenge in speech synthesis for understanding infant vocal learning, but it is incremental as it focuses on evaluating existing feature-metric combinations rather than introducing new methods.

The study tackled the problem of selecting optimal acoustic features and metrics for simulating early vocal learning of vowels using articulatory synthesis, showing that evaluating formant error and error surfaces in F1-F2 space can assess performance and provide perceptual insights.

The way infants use auditory cues to learn to speak despite the acoustic mismatch of their vocal apparatus is a hot topic of scientific debate. The simulation of early vocal learning using articulatory speech synthesis offers a way towards gaining a deeper understanding of this process. One of the crucial parameters in these simulations is the choice of features and a metric to evaluate the acoustic error between the synthesised sound and the reference target. We contribute with evaluating the performance of a set of 40 feature-metric combinations for the task of optimising the production of static vowels with a high-quality articulatory synthesiser. Towards this end we assess the usability of formant error and the projection of the feature-metric error surface in the normalised F1-F2 formant space. We show that this approach can be used to evaluate the impact of features and metrics and also to offer insight to perceptual results.

View on arXiv PDF

Similar