EMNS /Imz/ Corpus: An emotive single-speaker dataset for narrative storytelling in games, television and graphic novels
This addresses the problem of limited emotive speech data for researchers and developers in games, television, and graphic novels, though it is incremental as it focuses on a single-speaker dataset.
The authors tackled the need for emotive speech in interactive narratives by creating the EMNS corpus, a 2.3-hour dataset with eight emotional states, which achieved the highest average scores in conveying emotions and expressiveness compared to other datasets.
The increasing adoption of text-to-speech technologies has led to a growing demand for natural and emotive voices that adapt to a conversation's context and emotional tone. The Emotive Narrative Storytelling (EMNS) corpus is a unique speech dataset created to enhance conversations' expressiveness and emotive quality in interactive narrative-driven systems. The corpus consists of a 2.3-hour recording featuring a female speaker delivering labelled utterances. It encompasses eight acted emotional states, evenly distributed with a variance of 0.68%, along with expressiveness levels and natural language descriptions with word emphasis labels. The evaluation of audio samples from different datasets revealed that the EMNS corpus achieved the highest average scores in accurately conveying emotions and demonstrating expressiveness. It outperformed other datasets in conveying shared emotions and achieved comparable levels of genuineness. A classification task confirmed the accurate representation of intended emotions in the corpus, with participants recognising the recordings as genuine and expressive. Additionally, the availability of the dataset collection tool under the Apache 2.0 License simplifies remote speech data collection for researchers.