Talking Condition Recognition in Stressful and Emotional Talking Environments Based on CSPHMM2s
This work addresses the problem of improving speech recognition accuracy in challenging talking conditions for applications in human-computer interaction, but it is incremental as it builds on existing HMM-based methods.
This paper tackled the problem of talking condition recognition in stressful and emotional environments by using Second-Order Circular Suprasegmental Hidden Markov Models (CSPHMM2s) as classifiers, achieving results that outperform baseline models like HMMs, CHMM2s, and SPHMMs, with a 3.67% performance lead in stressful over emotional environments and subjective evaluations within 2.14-3.08% of objective results.
This work is aimed at exploiting Second-Order Circular Suprasegmental Hidden Markov Models (CSPHMM2s) as classifiers to enhance talking condition recognition in stressful and emotional talking environments (completely two separate environments). The stressful talking environment that has been used in this work uses Speech Under Simulated and Actual Stress (SUSAS) database, while the emotional talking environment uses Emotional Prosody Speech and Transcripts (EPST) database. The achieved results of this work using Mel-Frequency Cepstral Coefficients (MFCCs) demonstrate that CSPHMM2s outperform each of Hidden Markov Models (HMMs), Second-Order Circular Hidden Markov Models (CHMM2s), and Suprasegmental Hidden Markov Models (SPHMMs) in enhancing talking condition recognition in the stressful and emotional talking environments. The results also show that the performance of talking condition recognition in stressful talking environments leads that in emotional talking environments by 3.67% based on CSPHMM2s. Our results obtained in subjective evaluation by human judges fall within 2.14% and 3.08% of those obtained, respectively, in stressful and emotional talking environments based on CSPHMM2s.