CLAIOct 16, 2025

MERLIN: A Testbed for Multilingual Multimodal Entity Recognition and Linking

arXiv:2510.14307v1h-index: 20Has Code
Originality Synthesis-oriented
AI Analysis

This work provides a new dataset and benchmarks for researchers in multilingual multimodal AI, but it is incremental as it builds on existing entity linking methods.

The paper tackles the problem of multilingual multimodal entity linking by introducing MERLIN, a testbed with a dataset of BBC news articles in five languages and images, containing over 7,000 entity mentions linked to 2,500 Wikidata entities, and finds that visual data improves accuracy, especially for ambiguous contexts and less multilingual models.

This paper introduces MERLIN, a novel testbed system for the task of Multilingual Multimodal Entity Linking. The created dataset includes BBC news article titles, paired with corresponding images, in five languages: Hindi, Japanese, Indonesian, Vietnamese, and Tamil, featuring over 7,000 named entity mentions linked to 2,500 unique Wikidata entities. We also include several benchmarks using multilingual and multimodal entity linking methods exploring different language models like LLaMa-2 and Aya-23. Our findings indicate that incorporating visual data improves the accuracy of entity linking, especially for entities where the textual context is ambiguous or insufficient, and particularly for models that do not have strong multilingual abilities. For the work, the dataset, methods are available here at https://github.com/rsathya4802/merlin

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes