SDDec 17, 2015
Spectral Study of the Vocal Tract in Vowel Synthesis: A Comparison between 1D and 3D Acoustic AnalysisNegar M. Harandi, Daniel Aalto, Antti Hannukainen et al.
A state-of-the-art 1D acoustic synthesizer has been previously developed, and coupled to speaker-specific biomechanical models of oropharynx in ArtiSynth. As expected, the formant frequencies of the synthesized vowel sounds were shown to be different from those of the recorded audio. Such discrepancy was hypothesized to be due to the simplified geometry of the vocal tract model as well as the one dimensional implementation of Navier-Stokes equations. In this paper, we calculate Helmholtz resonances of our vocal tract geometries using 3D finite element method (FEM), and compare them with the formant frequencies obtained from the 1D method and audio. We hope such comparison helps with clarifying the limitations of our current models and/or speech synthesizer.
SDSep 17, 2015
Post-processing speech recordings during MRIJuha Kuortti, Jarmo Malinen, Antti Ojalammi
We discuss post-processing of speech that has been recorded during Magnetic Resonance Imaging (MRI) of the vocal tract. Such speech recordings are contaminated by high levels of acoustic noise from the MRI scanner. Also, the frequency response of the sound signal path is not flat as a result of severe restrictions on recording instrumentation due to MRI technology. The post-processing algorithm for noise reduction is based on adaptive spectral filtering. The speech material consists of samples of prolonged vowel productions that are used for validation of the post-processing algorithm. The comparison data is recorded in anechoic chamber from the same test subject. Formant analysis is carried out for the post-processed speech and the comparison data. Artificially noise-contaminated vowel samples are used for validation experiments to determine performance of the algorithm where using true data would be difficult. The properties of recording instrumentation or the post-processing algorithm do not explain the consistent frequency dependent discrepancy between formant data from experiments during MRI and in anechoic chamber. It is shown that the discrepancy is statistically significant, in particular, where it is largest at 1 kHz and 2 kHz. The reflecting surfaces of the MRI head and neck coil are suspected to change the speech acoustics which results in "external formants" at these frequencies. However, the role of test subject adaptation to noise and constrained space acoustics during an MRI examination cannot be ruled out.
DSAug 29, 2012
How far are vowel formants from computed vocal tract resonances?Daniel Aalto, Antti Huhtala, Atle Kivelä et al.
We compare numerically computed resonances of the human vocal tract with formants that have been extracted from speech during vowel pronunciation. The geometry of the vocal tract has been obtained by MRI from a male subject, and the corresponding speech has been recorded simultaneously. The resonances are computed by solving the Helmholtz partial differential equation with the Finite Element Method (FEM). Despite a rudimentary exterior space acoustics model, i.e., the Dirichlet boundary condition at the mouth opening, the computed resonance structure differs from the measured formant structure by $\approx$ 0.7 semitones for [i] and [u] having small mouth opening area, and by $\approx$ 3 semitones for vowels [a] and [ae] that have a larger mouth opening. The contribution of the possibly open velar port has not been taken into considaration at all which adds the discrepancy for [a] in the present data set. We conclude that by improving the exterior space model and properly treating the velar port opening, it is possible to computationally attain four lowest vowel formants with an error less than a semitone. The corresponding wave equation model on MRI-produced vocal tract geometries is expected to have a comparable accuracy.