CLOct 13, 2022

Low-resource Neural Machine Translation with Cross-modal Alignment

arXiv:2210.06716v124.0290 citationsh-index: 29Has Code

Originality Incremental advance

AI Analysis

This addresses translation challenges for low-resource languages by leveraging visual modality, though it is incremental as it builds on existing cross-modal methods.

The paper tackled neural machine translation for low-resource languages by using cross-modal alignment with visual data, achieving significant improvements over text-only baselines in zero-shot and few-shot scenarios.

How to achieve neural machine translation with limited parallel data? Existing techniques often rely on large-scale monolingual corpora, which is impractical for some low-resource languages. In this paper, we turn to connect several low-resource languages to a particular high-resource one by additional visual modality. Specifically, we propose a cross-modal contrastive learning method to learn a shared space for all languages, where both a coarse-grained sentence-level objective and a fine-grained token-level one are introduced. Experimental results and further analysis show that our method can effectively learn the cross-modal and cross-lingual alignment with a small amount of image-text pairs and achieves significant improvements over the text-only baseline under both zero-shot and few-shot scenarios.

View on arXiv PDF Code

Similar