CVDec 9, 2024

Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness

arXiv:2412.06293v27 citationsh-index: 22Has Code
Originality Incremental advance
AI Analysis

This addresses computational cost issues in MLLM fine-tuning for AI researchers, though it is incremental as it builds on existing data selection methods.

The paper tackles data redundancy in visual instruction datasets for multi-modal large language models by proposing DataTailor, a collaborative framework that selects data based on informativeness, uniqueness, and representativeness, achieving 101.3% of full-data performance with only 15% of the data.

Instruction tuning fine-tunes pre-trained Multi-modal Large Language Models (MLLMs) to handle real-world tasks. However, the rapid expansion of visual instruction datasets introduces data redundancy, leading to excessive computational costs. We propose a collaborative framework, DataTailor, which leverages three key principles--informativeness, uniqueness, and representativeness--for effective data selection. We argue that a valuable sample should be informative of the task, non-redundant, and represent the sample distribution (i.e., not an outlier). We further propose practical ways to score against each principle, which automatically adapts to a given dataset without tedious hyperparameter tuning. Comprehensive experiments on various benchmarks demonstrate that DataTailor achieves 101.3% of the performance of full-data fine-tuning with only 15% of the data, significantly reducing computational costs while maintaining superior results. This exemplifies the "Less is More" philosophy in MLLM development. The code and data is available in this \href{https://github.com/Yuqifan1117/DataTailor}{URL}.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes