AICLLGDec 11, 2023

MMICT: Boosting Multi-Modal Fine-Tuning with In-Context Examples

arXiv:2312.06363v312 citationsh-index: 12Has CodeACM Trans. Multim. Comput. Commun. Appl.
AI Analysis

This work addresses the challenge of enhancing multi-modal task performance for AI researchers and practitioners, representing an incremental advancement by integrating in-context learning into fine-tuning.

The paper tackles the problem of improving multi-modal fine-tuning by introducing Multi-Modal In-Context Tuning (MMICT), which leverages in-context learning capabilities of multi-modal LLMs, resulting in significant performance gains over traditional fine-tuning and vanilla ICT methods on diverse downstream tasks.

Although In-Context Learning (ICL) brings remarkable performance gains to Large Language Models (LLMs), the improvements remain lower than fine-tuning on downstream tasks. This paper introduces Multi-Modal In-Context Tuning (MMICT), a novel multi-modal fine-tuning paradigm that boosts multi-modal fine-tuning by fully leveraging the promising ICL capability of multi-modal LLMs (MM-LLMs). We propose the Multi-Modal Hub (M-Hub), a unified module that captures various multi-modal features according to different inputs and objectives. Based on M-Hub, MMICT enables MM-LLMs to learn from in-context visual-guided textual features and subsequently generate outputs conditioned on the textual-guided visual features. Moreover, leveraging the flexibility of M-Hub, we design a variety of in-context demonstrations. Extensive experiments on a diverse range of downstream multi-modal tasks demonstrate that MMICT significantly outperforms traditional fine-tuning strategy and the vanilla ICT method that directly takes the concatenation of all information from different modalities as input. Our implementation is available at: https://github.com/KDEGroup/MMICT.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes