CLApr 30, 2025

Investigating the Effect of Parallel Data in the Cross-Lingual Transfer for Vision-Language Encoders

Andrei-Alexandru Manea, Jindřich Libovický

arXiv:2504.21681v21 citationsh-index: 4TSD

Originality Incremental advance

AI Analysis

This work addresses the challenge of multilingual vision-language tasks for non-English speakers, but it is incremental as it builds on existing cross-lingual transfer methods.

The study investigated how parallel data affects cross-lingual transfer for vision-language encoders, finding that machine-translated task data performed best on average, but authentic caption-like data outperformed it in some languages, and that multilingual training benefits most languages.

Most pre-trained Vision-Language (VL) models and training data for the downstream tasks are only available in English. Therefore, multilingual VL tasks are solved using cross-lingual transfer: fine-tune a multilingual pre-trained model or transfer the text encoder using parallel data. We study the alternative approach: transferring an already trained encoder using parallel data. We investigate the effect of parallel data: domain and the number of languages, which were out of focus in previous work. Our results show that even machine-translated task data are the best on average, caption-like authentic parallel data outperformed it in some languages. Further, we show that most languages benefit from multilingual training.

View on arXiv PDF

Similar