CVMay 15, 2023

Parameter-efficient Tuning of Large-scale Multimodal Foundation Model

arXiv:2305.08381v341 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses efficiency and alignment issues in multimodal AI for researchers and practitioners, representing an incremental improvement with novel lightweight designs.

The paper tackles the challenge of reducing complexity and improving modality alignment in parameter-efficient tuning of large-scale multimodal foundation models, achieving state-of-the-art results on six cross-modal benchmarks and even outperforming full fine-tuning.

Driven by the progress of large-scale pre-training, parameter-efficient transfer learning has gained immense popularity across different subfields of Artificial Intelligence. The core is to adapt the model to downstream tasks with only a small set of parameters. Recently, researchers have leveraged such proven techniques in multimodal tasks and achieve promising results. However, two critical issues remain unresolved: how to further reduce the complexity with lightweight design and how to boost alignment between modalities under extremely low parameters. In this paper, we propose A graceful prompt framework for cross-modal transfer (Aurora) to overcome these challenges. Considering the redundancy in existing architectures, we first utilize the mode approximation to generate 0.1M trainable parameters to implement the multimodal prompt tuning, which explores the low intrinsic dimension with only 0.04% parameters of the pre-trained model. Then, for better modality alignment, we propose the Informative Context Enhancement and Gated Query Transformation module under extremely few parameters scenes. A thorough evaluation on six cross-modal benchmarks shows that it not only outperforms the state-of-the-art but even outperforms the full fine-tuning approach. Our code is available at: https://github.com/WillDreamer/Aurora.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes