EMERGE: A Benchmark for Updating Knowledge Graphs with Emerging Textual Knowledge
This addresses the need for benchmarks in updating knowledge graphs with evolving knowledge, though it is incremental as it builds on existing datasets and methods.
The paper tackles the problem of automatically updating knowledge graphs over time using emerging textual knowledge, and introduces a dataset of 376K Wikipedia passages aligned with 1.25M KG edits across 10 Wikidata snapshots from 2019 to 2025.
Knowledge Graphs (KGs) are structured knowledge repositories containing entities and relations between them. In this paper, we investigate the problem of automatically updating KGs over time with respect to the evolution of knowledge in unstructured textual sources. This problem requires identifying a wide range of update operations based on the state of an existing KG at a specific point in time. This contrasts with traditional information extraction pipelines, which extract knowledge from text independently of the current state of a KG. To address this challenge, we propose a method for lifelong construction of a dataset consisting of Wikidata KG snapshots over time and Wikipedia passages paired with the corresponding edit operations that they induce in a particular KG snapshot. The resulting dataset comprises 376K Wikipedia passages aligned with a total of 1.25M KG edits over 10 different snapshots of Wikidata from 2019 to 2025. Our experimental results highlight challenges in updating KG snapshots based on emerging textual knowledge, positioning the dataset as a valuable benchmark for future research. We will publicly release our dataset and model implementations.