AILGJun 6, 2023

Schema First! Learn Versatile Knowledge Graph Embeddings by Capturing Semantics with MASCHInE

arXiv:2306.03659v25 citationsh-index: 45
Originality Incremental advance
AI Analysis

This work addresses the need for versatile and semantically meaningful knowledge graph embeddings for tasks like entity clustering, node classification, and link prediction, offering an incremental improvement over existing methods.

The paper tackles the problem of knowledge graph embeddings (KGEs) being task-dependent and lacking semantic representation by proposing MASCHInE, a method that uses protographs to capture semantics, resulting in more versatile KGEs with substantially better performance for entity clustering and node classification, and increased semantically valid predictions for link prediction with equivalent rank-based performance.

Knowledge graph embedding models (KGEMs) have gained considerable traction in recent years. These models learn a vector representation of knowledge graph entities and relations, a.k.a. knowledge graph embeddings (KGEs). Learning versatile KGEs is desirable as it makes them useful for a broad range of tasks. However, KGEMs are usually trained for a specific task, which makes their embeddings task-dependent. In parallel, the widespread assumption that KGEMs actually create a semantic representation of the underlying entities and relations (e.g., project similar entities closer than dissimilar ones) has been challenged. In this work, we design heuristics for generating protographs -- small, modified versions of a KG that leverage RDF/S information. The learnt protograph-based embeddings are meant to encapsulate the semantics of a KG, and can be leveraged in learning KGEs that, in turn, also better capture semantics. Extensive experiments on various evaluation benchmarks demonstrate the soundness of this approach, which we call Modular and Agnostic SCHema-based Integration of protograph Embeddings (MASCHInE). In particular, MASCHInE helps produce more versatile KGEs that yield substantially better performance for entity clustering and node classification tasks. For link prediction, using MASCHinE substantially increases the number of semantically valid predictions with equivalent rank-based performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes