LGAICLCVAug 23, 2023

InstructionGPT-4: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4

arXiv:2308.12067v275 citationsh-index: 18Has Code
Originality Incremental advance
AI Analysis

This addresses the challenge of data efficiency in training multimodal models, offering a method to reduce data requirements while improving performance, though it is incremental as it builds on existing fine-tuning paradigms.

The paper tackles the problem of fine-tuning multimodal large language models by showing that using only 200 high-quality instruction examples (6% of the data used for MiniGPT-4) can outperform the original MiniGPT-4 on various evaluations.

Multimodal large language models are typically trained in two stages: first pre-training on image-text pairs, and then fine-tuning using supervised vision-language instruction data. Recent studies have shown that large language models can achieve satisfactory results even with a limited amount of high-quality instruction-following data. In this paper, we introduce InstructionGPT-4, which is fine-tuned on a small dataset comprising only 200 examples, amounting to approximately 6\% of the instruction-following data used in the alignment dataset for MiniGPT-4. To achieve this, we first propose several metrics to access the quality of multimodal instruction data. Based on these metrics, we present an effective and trainable data selector to automatically identify and filter low-quality vision-language data. By employing this method, InstructionGPT-4 outperforms the original MiniGPT-4 on various evaluations. Overall, our findings demonstrate that less but high-quality instruction tuning data is efficient in enabling multimodal large language models to generate better output. Our code is available at https://github.com/waltonfuture/InstructionGPT-4.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes