CLIRNov 5, 2020

Entity Linking in 100 Languages

arXiv:2011.02690v11013 citations
AI Analysis

This work addresses entity linking for low-resource languages and rare entities, though it is incremental with improvements in feature representation and training techniques.

The authors tackled multilingual entity linking across 100+ languages and 20 million entities by proposing a new formulation and training a dual encoder model, which outperformed state-of-the-art results in cross-lingual linking tasks.

We propose a new formulation for multilingual entity linking, where language-specific mentions resolve to a language-agnostic Knowledge Base. We train a dual encoder in this new setting, building on prior work with improved feature representation, negative mining, and an auxiliary entity-pairing task, to obtain a single entity retrieval model that covers 100+ languages and 20 million entities. The model outperforms state-of-the-art results from a far more limited cross-lingual linking task. Rare entities and low-resource languages pose challenges at this large-scale, so we advocate for an increased focus on zero- and few-shot evaluation. To this end, we provide Mewsli-9, a large new multilingual dataset (http://goo.gle/mewsli-dataset) matched to our setting, and show how frequency-based analysis provided key insights for our model and training enhancements.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes