CLAIMay 9, 2025

Multimodal Sentiment Analysis on CMU-MOSEI Dataset using Transformer-based Models

arXiv:2505.06110v24 citationsh-index: 1
Originality Incremental advance
AI Analysis

This is an incremental improvement in sentiment analysis for applications like social media or customer feedback by combining multiple data types.

This project tackled multimodal sentiment analysis by integrating text, audio, and visual data using transformer-based models with early fusion, achieving 97.87% accuracy and a 0.9682 F1-score on the CMU-MOSEI dataset.

This project performs multimodal sentiment analysis using the CMU-MOSEI dataset, using transformer-based models with early fusion to integrate text, audio, and visual modalities. We employ BERT-based encoders for each modality, extracting embeddings that are concatenated before classification. The model achieves strong performance, with 97.87% 7-class accuracy and a 0.9682 F1-score on the test set, demonstrating the effectiveness of early fusion in capturing cross-modal interactions. The training utilized Adam optimization (lr=1e-4), dropout (0.3), and early stopping to ensure generalization and robustness. Results highlight the superiority of transformer architectures in modeling multimodal sentiment, with a low MAE (0.1060) indicating precise sentiment intensity prediction. Future work may compare fusion strategies or enhance interpretability. This approach utilizes multimodal learning by effectively combining linguistic, acoustic, and visual cues for sentiment analysis.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes