The DLCC Node Classification Benchmark for Analyzing Knowledge Graph Embeddings
This provides a new benchmark for researchers to analyze knowledge graph embeddings, addressing a gap in understanding their learned representations, though it is incremental as it builds on existing evaluation methods.
The authors tackled the problem of evaluating what information knowledge graph embeddings actually learn by introducing the DLCC benchmark, which analyzes embedding approaches based on the types of classes they can represent, and found that many constructors are learned through correlated patterns and that cardinality constraints are particularly difficult for most embeddings.
Knowledge graph embedding is a representation learning technique that projects entities and relations in a knowledge graph to continuous vector spaces. Embeddings have gained a lot of uptake and have been heavily used in link prediction and other downstream prediction tasks. Most approaches are evaluated on a single task or a single group of tasks to determine their overall performance. The evaluation is then assessed in terms of how well the embedding approach performs on the task at hand. Still, it is hardly evaluated (and often not even deeply understood) what information the embedding approaches are actually learning to represent. To fill this gap, we present the DLCC (Description Logic Class Constructors) benchmark, a resource to analyze embedding approaches in terms of which kinds of classes they can represent. Two gold standards are presented, one based on the real-world knowledge graph DBpedia and one synthetic gold standard. In addition, an evaluation framework is provided that implements an experiment protocol so that researchers can directly use the gold standard. To demonstrate the use of DLCC, we compare multiple embedding approaches using the gold standards. We find that many DL constructors on DBpedia are actually learned by recognizing different correlated patterns than those defined in the gold standard and that specific DL constructors, such as cardinality constraints, are particularly hard to be learned for most embedding approaches.