CLAug 25, 2025

Language-Specific Layer Matters: Efficient Multilingual Enhancement for Large Vision-Language Models

arXiv:2508.18381v11 citationsh-index: 10EMNLP
Originality Incremental advance
AI Analysis

This addresses the problem of multilingual inefficiency in vision-language models for users needing cross-lingual applications, representing an incremental improvement in efficiency.

The paper tackled the imbalance in multilingual capabilities of large vision-language models by identifying that language-specific neuron activations in shallow layers correlate with multilingual understanding, and introduced PLAST, a training recipe that fine-tunes only 14% of parameters to improve performance on benchmarks like MM-Bench and MMMB.

Large vision-language models (LVLMs) have demonstrated exceptional capabilities in understanding visual information with human languages but also exhibit an imbalance in multilingual capabilities. In this work, we delve into the multilingual working pattern of LVLMs and identify a salient correlation between the multilingual understanding ability of LVLMs and language-specific neuron activations in shallow layers. Building on this insight, we introduce PLAST, a training recipe that achieves efficient multilingual enhancement for LVLMs by Precise LAnguage-Specific layers fine-Tuning. PLAST first identifies layers involved in multilingual understanding by monitoring language-specific neuron activations. These layers are then precisely fine-tuned with question-translation pairs to achieve multilingual alignment. Our empirical results on MM-Bench and MMMB demonstrate that PLAST effectively improves the multilingual capabilities of LVLMs and achieves significant efficiency with only 14% of the parameters tuned. Further analysis reveals that PLAST can be generalized to low-resource and complex visual reasoning tasks, facilitating the language-specific visual information engagement in shallow layers.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes