CLOct 11, 2022

Like a bilingual baby: The advantage of visually grounding a bilingual language model

arXiv:2210.05487v2h-index: 16
AI Analysis

This work addresses the challenge of making language models more human-like for multilingual applications, but it is incremental as it builds on existing visually grounded approaches.

The authors tackled the problem of improving multilingual language models by training an LSTM on visually grounded English and Spanish captions from MS-COCO-ES, finding that visual grounding enhanced semantic similarity understanding and reduced perplexity, though it did not benefit abstract words.

Unlike most neural language models, humans learn language in a rich, multi-sensory and, often, multi-lingual environment. Current language models typically fail to fully capture the complexities of multilingual language use. We train an LSTM language model on images and captions in English and Spanish from MS-COCO-ES. We find that the visual grounding improves the model's understanding of semantic similarity both within and across languages and improves perplexity. However, we find no significant advantage of visual grounding for abstract words. Our results provide additional evidence of the advantages of visually grounded language models and point to the need for more naturalistic language data from multilingual speakers and multilingual datasets with perceptual grounding.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes