SEGAA: A Unified Approach to Predicting Age, Gender, and Emotion in Speech
This addresses the need for efficient multi-task prediction in speech analysis for applications like customer service and healthcare, but it is incremental as it builds on prior individual prediction methods.
This paper tackled the problem of simultaneously predicting age, gender, and emotion from speech, and found that their multi-output SEGAA model performed comparably to individual models while improving runtime.
The interpretation of human voices holds importance across various applications. This study ventures into predicting age, gender, and emotion from vocal cues, a field with vast applications. Voice analysis tech advancements span domains, from improving customer interactions to enhancing healthcare and retail experiences. Discerning emotions aids mental health, while age and gender detection are vital in various contexts. Exploring deep learning models for these predictions involves comparing single, multi-output, and sequential models highlighted in this paper. Sourcing suitable data posed challenges, resulting in the amalgamation of the CREMA-D and EMO-DB datasets. Prior work showed promise in individual predictions, but limited research considered all three variables simultaneously. This paper identifies flaws in an individual model approach and advocates for our novel multi-output learning architecture Speech-based Emotion Gender and Age Analysis (SEGAA) model. The experiments suggest that Multi-output models perform comparably to individual models, efficiently capturing the intricate relationships between variables and speech inputs, all while achieving improved runtime.