On the Use of a Spectral Glottal Model for the Source-filter Separation of Speech
This work addresses a specific bottleneck in speech analysis for researchers and practitioners by providing a more accurate method for source-filter separation, though it is incremental as it builds on the existing IAIF technique.
The paper tackled the problem of estimating glottal flow from speech by improving the separation of glottal features, particularly high-frequency spectral tilt, which is crucial for vocal effort perception. The result was GFM-IAIF, an enhanced method that maintains vocal tract removal performance while significantly improving perceptive timbral variations associated with vocal effort.
The estimation of glottal flow from a speech waveform is a key method for speech analysis and parameterization. Significant research effort has been made to dissociate the first vocal tract resonance from the glottal formant (the low-frequency resonance describing the open-phase of the vocal fold vibration). However few methods cope with estimation of high-frequency spectral tilt to describe the return-phase of the vocal fold vibration, which is crucial to the perception of vocal effort. This paper proposes an improved version of the well-known Iterative Adaptive Inverse Filtering (IAIF) called GFM-IAIF. GFM-IAIF includes a full spectral model of the glottis that incorporates both glottal formant and spectral tilt features. Comparisons with the standard IAIF method show that while GFM-IAIF maintains good performance on vocal tract removal, it significantly improves the perceptive timbral variations associated to vocal effort.