CLMay 24, 2023

Meta-learning For Vision-and-language Cross-lingual Transfer

arXiv:2305.14843v2132 citations
Originality Highly original
AI Analysis

This addresses the problem of cross-lingual transfer in vision-language AI for multilingual applications, representing an incremental improvement with a novel method for a known bottleneck.

The paper tackles the poor performance of pre-trained vision-language models in zero-shot and few-shot cross-lingual transfer, especially for low-resource languages, by proposing a meta-learning fine-tuning framework that boosts performance on vision-language understanding tasks and datasets.

Current pre-trained vison-language models (PVLMs) achieve excellent performance on a range of multi-modal datasets. Recent work has aimed at building multilingual models, and a range of novel multilingual multi-modal datasets have been proposed. Current PVLMs typically perform poorly on these datasets when used for multi-modal zero-shot or few-shot cross-lingual transfer, especially for low-resource languages. To alleviate this problem, we propose a novel meta-learning fine-tuning framework. Our framework makes current PVLMs rapidly adaptive to new languages in vision-language scenarios by designing MAML in a cross-lingual multi-modal manner. Experiments show that our method boosts the performance of current state-of-the-art PVLMs in both zero-shot and few-shot cross-lingual transfer on a range of vision-language understanding tasks and datasets (XVNLI, xGQA, MaRVL, xFlicker&Co)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes