Emotion Classification: How Does an Automated System Compare to Naive Human Coders?
This addresses the debate over automated vs. human emotion classification for applications in human-computer interaction, showing incremental progress by demonstrating superior performance of an existing method.
The study compared a state-of-the-art automated system to 138 naive human coders in speech-based emotion classification across six emotions, arousal, and valence classes, finding that the computer system outperformed humans in almost all cases and could improve accuracy by rejecting uncertain classifications.
The fact that emotions play a vital role in social interactions, along with the demand for novel human-computer interaction applications, have led to the development of a number of automatic emotion classification systems. However, it is still debatable whether the performance of such systems can compare with human coders. To address this issue, in this study, we present a comprehensive comparison in a speech-based emotion classification task between 138 Amazon Mechanical Turk workers (Turkers) and a state-of-the-art automatic computer system. The comparison includes classifying speech utterances into six emotions (happy, neutral, sad, anger, disgust and fear), into three arousal classes (active, passive, and neutral), and into three valence classes (positive, negative, and neutral). The results show that the computer system outperforms the naive Turkers in almost all cases. Furthermore, the computer system can increase the classification accuracy by rejecting to classify utterances for which it is not confident, while the Turkers do not show a significantly higher classification accuracy on their confident utterances versus unconfident ones.