Spectral Study of the Vocal Tract in Vowel Synthesis: A Comparison between 1D and 3D Acoustic Analysis
This work addresses limitations in speech synthesis models for researchers, but it is incremental as it focuses on clarifying existing model discrepancies without introducing new methods.
The study compared formant frequencies from a 1D acoustic synthesizer and 3D finite element analysis for vowel synthesis, finding discrepancies attributed to simplified geometry and 1D equations, but did not report specific numerical improvements.
A state-of-the-art 1D acoustic synthesizer has been previously developed, and coupled to speaker-specific biomechanical models of oropharynx in ArtiSynth. As expected, the formant frequencies of the synthesized vowel sounds were shown to be different from those of the recorded audio. Such discrepancy was hypothesized to be due to the simplified geometry of the vocal tract model as well as the one dimensional implementation of Navier-Stokes equations. In this paper, we calculate Helmholtz resonances of our vocal tract geometries using 3D finite element method (FEM), and compare them with the formant frequencies obtained from the 1D method and audio. We hope such comparison helps with clarifying the limitations of our current models and/or speech synthesizer.