AIAug 6, 2021

Creating and Querying Personalized Versions of Wikidata on a Laptop

Hans Chalupsky, Pedro Szekely, Filip Ilievski, Daniel Garijo, Kartik Shenoy

arXiv:2108.07119v26.12 citations

Originality Incremental advance

AI Analysis

This addresses a bottleneck for application developers needing to perform complex analyses on Wikidata, offering a practical solution for personalized data variants.

The paper tackles the problem of supporting complex queries over large fractions of Wikidata, which existing methods like dumps, API, or SPARQL cannot handle efficiently, by introducing KGTK Kypher, a query language and processor that enables users to run such queries on a laptop with much faster performance than equivalent SPARQL queries on a powerful server.

Application developers today have three choices for exploiting the knowledge present in Wikidata: they can download the Wikidata dumps in JSON or RDF format, they can use the Wikidata API to get data about individual entities, or they can use the Wikidata SPARQL endpoint. None of these methods can support complex, yet common, query use cases, such as retrieval of large amounts of data or aggregations over large fractions of Wikidata. This paper introduces KGTK Kypher, a query language and processor that allows users to create personalized variants of Wikidata on a laptop. We present several use cases that illustrate the types of analyses that Kypher enables users to run on the full Wikidata KG on a laptop, combining data from external resources such as DBpedia. The Kypher queries for these use cases run much faster on a laptop than the equivalent SPARQL queries on a Wikidata clone running on a powerful server with 24h time-out limits.

View on arXiv PDF

Similar