CVOct 3, 2025

LEAML: Label-Efficient Adaptation to Out-of-Distribution Visual Tasks for Multimodal Large Language Models

Ci-Siang Lin, Min-Hung Chen, Yu-Yang Sheng, Yu-Chiang Frank Wang

arXiv:2510.03232v13.6h-index: 6

Originality Incremental advance

AI Analysis

This addresses the challenge of expensive labeling for specialized domains, though it appears incremental as an adaptation framework.

The paper tackles the problem of adapting multimodal large language models to out-of-distribution visual tasks like medical imaging with limited labeled data, achieving consistent performance improvements over standard fine-tuning in experiments on gastrointestinal endoscopy and sports VQA.

Multimodal Large Language Models (MLLMs) have achieved strong performance on general visual benchmarks but struggle with out-of-distribution (OOD) tasks in specialized domains such as medical imaging, where labeled data is limited and expensive. We introduce LEAML, a label-efficient adaptation framework that leverages both scarce labeled VQA samples and abundant unlabeled images. Our approach generates domain-relevant pseudo question-answer pairs for unlabeled data using a QA generator regularized by caption distillation. Importantly, we selectively update only those neurons most relevant to question-answering, enabling the QA Generator to efficiently acquire domain-specific knowledge during distillation. Experiments on gastrointestinal endoscopy and sports VQA demonstrate that LEAML consistently outperforms standard fine-tuning under minimal supervision, highlighting the effectiveness of our proposed LEAML framework.

View on arXiv PDF

Similar