Benchmarking Multimodal Sentiment Analysis
It addresses sentiment analysis for multimodal data, serving as a new benchmark for further research, but is incremental as it builds on existing methods with added features.
The paper tackles multimodal sentiment analysis and emotion recognition by proposing a framework using convolutional neural networks for feature extraction from text and visual modalities, achieving a 10% performance improvement over state-of-the-art methods.
We propose a framework for multimodal sentiment analysis and emotion recognition using convolutional neural network-based feature extraction from text and visual modalities. We obtain a performance improvement of 10% over the state of the art by combining visual, text and audio features. We also discuss some major issues frequently ignored in multimodal sentiment analysis research: the role of speaker-independent models, importance of the modalities and generalizability. The paper thus serve as a new benchmark for further research in multimodal sentiment analysis and also demonstrates the different facets of analysis to be considered while performing such tasks.