Excitation-based Voice Quality Analysis and Modification
This work addresses voice quality modification in speech synthesis, but it is incremental as it builds on existing HMM-based methods with specific excitation rules.
The paper analyzed excitation differences across modal, soft, and loud voice qualities in a speaker corpus and used these insights to develop a voice quality transformation system for HMM-based speech synthesis, effectively achieving transformations while maintaining quality.
This paper investigates the differences occuring in the excitation for different voice qualities. Its goal is two-fold. First a large corpus containing three voice qualities (modal, soft and loud) uttered by the same speaker is analyzed and significant differences in characteristics extracted from the excitation are observed. Secondly rules of modification derived from the analysis are used to build a voice quality transformation system applied as a post-process to HMM-based speech synthesis. The system is shown to effectively achieve the transformations while maintaining the delivered quality.