AIAug 6, 2021

Creating and Querying Personalized Versions of Wikidata on a Laptop

arXiv:2108.07119v22 citations
AI Analysis

This addresses a bottleneck for application developers needing to perform complex analyses on Wikidata, offering a practical solution for personalized data variants.

The paper tackles the problem of supporting complex queries over large fractions of Wikidata, which existing methods like dumps, API, or SPARQL cannot handle efficiently, by introducing KGTK Kypher, a query language and processor that enables users to run such queries on a laptop with much faster performance than equivalent SPARQL queries on a powerful server.

Application developers today have three choices for exploiting the knowledge present in Wikidata: they can download the Wikidata dumps in JSON or RDF format, they can use the Wikidata API to get data about individual entities, or they can use the Wikidata SPARQL endpoint. None of these methods can support complex, yet common, query use cases, such as retrieval of large amounts of data or aggregations over large fractions of Wikidata. This paper introduces KGTK Kypher, a query language and processor that allows users to create personalized variants of Wikidata on a laptop. We present several use cases that illustrate the types of analyses that Kypher enables users to run on the full Wikidata KG on a laptop, combining data from external resources such as DBpedia. The Kypher queries for these use cases run much faster on a laptop than the equivalent SPARQL queries on a Wikidata clone running on a powerful server with 24h time-out limits.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes