A Simple Attention-Based Mechanism for Bimodal Emotion Classification
This addresses emotion classification for AI applications by improving accuracy through multimodal data integration, though it appears incremental as it builds on existing deep learning and attention methods.
The paper tackles emotion classification by proposing a novel bimodal deep learning architecture with attention mechanism that uses both text and speech data, showing it outperforms single-modality approaches and several state-of-the-art systems.
Big data contain rich information for machine learning algorithms to utilize when learning important features during classification tasks. Human beings express their emotion using certain words, speech (tone, pitch, speed) or facial expression. Artificial Intelligence approach to emotion classification are largely based on learning from textual information. However, public datasets containing text and speech data provide sufficient resources to train machine learning algorithms for the tack of emotion classification. In this paper, we present novel bimodal deep learning-based architectures enhanced with attention mechanism trained and tested on text and speech data for emotion classification. We report details of different deep learning based architectures and show the performance of each architecture including rigorous error analyses. Our finding suggests that deep learning based architectures trained on different types of data (text and speech) outperform architectures trained only on text or speech. Our proposed attention-based bimodal architecture outperforms several state-of-the-art systems in emotion classification.