AIJul 1, 2022

Enriching Wikidata with Linked Open Data

Bohui Zhang, Filip Ilievski, Pedro Szekely

arXiv:2207.00143v24.52 citationsh-index: 17

Originality Incremental advance

AI Analysis

This work addresses the gap in relevant information for users of large knowledge graphs like Wikidata, though it is incremental as it builds on existing mechanisms for entity alignment and source selection.

The paper tackled the problem of missing information in Wikidata by developing a workflow to enrich it with structured data from Linked Open Data sources, resulting in the addition of millions of novel statements with high quality.

Large public knowledge graphs, like Wikidata, contain billions of statements about tens of millions of entities, thus inspiring various use cases to exploit such knowledge graphs. However, practice shows that much of the relevant information that fits users' needs is still missing in Wikidata, while current linked open data (LOD) tools are not suitable to enrich large graphs like Wikidata. In this paper, we investigate the potential of enriching Wikidata with structured data sources from the LOD cloud. We present a novel workflow that includes gap detection, source selection, schema alignment, and semantic validation. We evaluate our enrichment method with two complementary LOD sources: a noisy source with broad coverage, DBpedia, and a manually curated source with a narrow focus on the art domain, Getty. Our experiments show that our workflow can enrich Wikidata with millions of novel statements from external LOD sources with high quality. Property alignment and data quality are key challenges, whereas entity alignment and source selection are well-supported by existing Wikidata mechanisms. We make our code and data available to support future work.

View on arXiv PDF

Similar