CVSDASAug 30, 2022

Video-based Cross-modal Auxiliary Network for Multimodal Sentiment Analysis

arXiv:2208.13954v135 citationsh-index: 13Has Code
Originality Incremental advance
AI Analysis

This work addresses multimodal sentiment analysis for applications relying on audiovisual data, presenting an incremental improvement by enhancing feature extraction and reducing redundancy.

The paper tackles the problem of insufficient unimodal feature extraction and data redundancy in multimodal sentiment analysis by proposing a Video-based Cross-modal Auxiliary Network (VCAN), which improves classification accuracy on benchmarks like RAVDESS, CMU-MOSI, and CMU-MOSEI, showing significant superiority over state-of-the-art methods.

Multimodal sentiment analysis has a wide range of applications due to its information complementarity in multimodal interactions. Previous works focus more on investigating efficient joint representations, but they rarely consider the insufficient unimodal features extraction and data redundancy of multimodal fusion. In this paper, a Video-based Cross-modal Auxiliary Network (VCAN) is proposed, which is comprised of an audio features map module and a cross-modal selection module. The first module is designed to substantially increase feature diversity in audio feature extraction, aiming to improve classification accuracy by providing more comprehensive acoustic representations. To empower the model to handle redundant visual features, the second module is addressed to efficiently filter the redundant visual frames during integrating audiovisual data. Moreover, a classifier group consisting of several image classification networks is introduced to predict sentiment polarities and emotion categories. Extensive experimental results on RAVDESS, CMU-MOSI, and CMU-MOSEI benchmarks indicate that VCAN is significantly superior to the state-of-the-art methods for improving the classification accuracy of multimodal sentiment analysis.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes