ASAISDNov 18, 2022

A Persian ASR-based SER: Modification of Sharif Emotional Speech Database and Investigation of Persian Text Corpora

arXiv:2211.09956v13 citationsh-index: 13
Originality Synthesis-oriented
AI Analysis

This work addresses data quality issues for Persian SER, which is incremental as it modifies an existing database and applies known methods to a specific language domain.

The paper tackled inconsistencies in the Persian Sharif Emotional Speech Database (ShEMO) by using an Automatic Speech Recognition (ASR) system and investigating Farsi language models, and introduced a Persian ASR-based Speech Emotion Recognition (SER) system using linguistic features and deep learning models.

Speech Emotion Recognition (SER) is one of the essential perceptual methods of humans in understanding the situation and how to interact with others, therefore, in recent years, it has been tried to add the ability to recognize emotions to human-machine communication systems. Since the SER process relies on labeled data, databases are essential for it. Incomplete, low-quality or defective data may lead to inaccurate predictions. In this paper, we fixed the inconsistencies in Sharif Emotional Speech Database (ShEMO), as a Persian database, by using an Automatic Speech Recognition (ASR) system and investigating the effect of Farsi language models obtained from accessible Persian text corpora. We also introduced a Persian/Farsi ASR-based SER system that uses linguistic features of the ASR outputs and Deep Learning-based models.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes