CLCVLGDec 1, 2022

Adapted Multimodal BERT with Layer-wise Fusion for Sentiment Analysis

arXiv:2212.00678v113 citationsh-index: 43
Originality Incremental advance
AI Analysis

This work addresses parameter efficiency in multimodal models for sentiment analysis, offering an incremental improvement over existing methods.

The paper tackled the problem of high parameter costs in multimodal learning by proposing Adapted Multimodal BERT (AMB), which uses adapter modules and layer-wise fusion to efficiently integrate audio-visual information with text, resulting in a 3.4% relative error reduction and 2.1% accuracy improvement on sentiment analysis tasks.

Multimodal learning pipelines have benefited from the success of pretrained language models. However, this comes at the cost of increased model parameters. In this work, we propose Adapted Multimodal BERT (AMB), a BERT-based architecture for multimodal tasks that uses a combination of adapter modules and intermediate fusion layers. The adapter adjusts the pretrained language model for the task at hand, while the fusion layers perform task-specific, layer-wise fusion of audio-visual information with textual BERT representations. During the adaptation process the pre-trained language model parameters remain frozen, allowing for fast, parameter-efficient training. In our ablations we see that this approach leads to efficient models, that can outperform their fine-tuned counterparts and are robust to input noise. Our experiments on sentiment analysis with CMU-MOSEI show that AMB outperforms the current state-of-the-art across metrics, with 3.4% relative reduction in the resulting error and 2.1% relative improvement in 7-class classification accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes