CLJun 30, 2023

Knowledge Base Completion for Long-Tail Entities

arXiv:2306.17472v1224 citationsh-index: 96
Originality Incremental advance
AI Analysis

This addresses the gap in knowledge bases for long-tail entities, which is crucial for improving coverage in applications like search and AI assistants, though it is incremental as it builds on existing LM-based approaches.

The paper tackles the problem of knowledge base completion for long-tail entities, which are often neglected in prior work, and introduces a novel two-stage LM-based method that outperforms baselines in F1, with significant gains in recall.

Despite their impressive scale, knowledge bases (KBs), such as Wikidata, still contain significant gaps. Language models (LMs) have been proposed as a source for filling these gaps. However, prior works have focused on prominent entities with rich coverage by LMs, neglecting the crucial case of long-tail entities. In this paper, we present a novel method for LM-based-KB completion that is specifically geared for facts about long-tail entities. The method leverages two different LMs in two stages: for candidate retrieval and for candidate verification and disambiguation. To evaluate our method and various baselines, we introduce a novel dataset, called MALT, rooted in Wikidata. Our method outperforms all baselines in F1, with major gains especially in recall.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes