SDCLASSep 1, 2025

ArabEmoNet: A Lightweight Hybrid 2D CNN-BiLSTM Model with Attention for Robust Arabic Speech Emotion Recognition

arXiv:2509.01401v12 citationsh-index: 6Proceedings of The Third Arabic Natural Language Processing Conference
Originality Incremental advance
AI Analysis

This work addresses speech emotion recognition for Arabic, a low-resource language, with an incremental improvement in efficiency for resource-constrained environments.

The paper tackled the problem of Arabic speech emotion recognition by introducing ArabEmoNet, a lightweight model that achieved state-of-the-art performance with only 1 million parameters, making it 90 times smaller than HuBERT base and 74 times smaller than Whisper.

Speech emotion recognition is vital for human-computer interaction, particularly for low-resource languages like Arabic, which face challenges due to limited data and research. We introduce ArabEmoNet, a lightweight architecture designed to overcome these limitations and deliver state-of-the-art performance. Unlike previous systems relying on discrete MFCC features and 1D convolutions, which miss nuanced spectro-temporal patterns, ArabEmoNet uses Mel spectrograms processed through 2D convolutions, preserving critical emotional cues often lost in traditional methods. While recent models favor large-scale architectures with millions of parameters, ArabEmoNet achieves superior results with just 1 million parameters, 90 times smaller than HuBERT base and 74 times smaller than Whisper. This efficiency makes it ideal for resource-constrained environments. ArabEmoNet advances Arabic speech emotion recognition, offering exceptional performance and accessibility for real-world applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes