AI DB LGAug 21, 2023

KGrEaT: A Framework to Evaluate Knowledge Graphs via Downstream Tasks

Nicolas Heist, Sven Hertling, Heiko Paulheim

arXiv:2308.10537v110.012 citationsh-index: 45

Originality Synthesis-oriented

AI Analysis

This addresses the need for more practical evaluation methods in knowledge graph research, though it is incremental as it builds on existing task-based approaches.

The paper tackles the problem of evaluating knowledge graphs by proposing KGrEaT, a framework that assesses their quality through downstream tasks like classification and recommendation, rather than relying on traditional metrics like correctness and completeness.

In recent years, countless research papers have addressed the topics of knowledge graph creation, extension, or completion in order to create knowledge graphs that are larger, more correct, or more diverse. This research is typically motivated by the argumentation that using such enhanced knowledge graphs to solve downstream tasks will improve performance. Nonetheless, this is hardly ever evaluated. Instead, the predominant evaluation metrics - aiming at correctness and completeness - are undoubtedly valuable but fail to capture the complete picture, i.e., how useful the created or enhanced knowledge graph actually is. Further, the accessibility of such a knowledge graph is rarely considered (e.g., whether it contains expressive labels, descriptions, and sufficient context information to link textual mentions to the entities of the knowledge graph). To better judge how well knowledge graphs perform on actual tasks, we present KGrEaT - a framework to estimate the quality of knowledge graphs via actual downstream tasks like classification, clustering, or recommendation. Instead of comparing different methods of processing knowledge graphs with respect to a single task, the purpose of KGrEaT is to compare various knowledge graphs as such by evaluating them on a fixed task setup. The framework takes a knowledge graph as input, automatically maps it to the datasets to be evaluated on, and computes performance metrics for the defined tasks. It is built in a modular way to be easily extendable with additional tasks and datasets.

View on arXiv PDF

Similar