CLMar 15, 2022

Modular and Parameter-Efficient Multimodal Fusion with Prompting

Sheng Liang, Mengjie Zhao, Hinrich Schütze

arXiv:2203.08055v132.4647 citationsh-index: 70

Originality Incremental advance

AI Analysis

This work addresses the need for parameter-efficient and modular fusion methods for multimodal tasks, but it is incremental as it builds on existing prompting techniques.

The paper tackles the problem of efficient multimodal fusion in large-scale pre-trained models by using prompt vectors to align modalities, achieving comparable performance to other methods in low-resource settings.

Recent research has made impressive progress in large-scale multimodal pre-training. In the context of the rapid growth of model size, it is necessary to seek efficient and flexible methods other than finetuning. In this paper, we propose to use prompt vectors to align the modalities. Our method achieves comparable performance to several other multimodal fusion methods in low-resource settings. We further show that our method is modular and parameter-efficient for processing tasks involving two or more data modalities.

View on arXiv PDF

Similar