AIIRLGDec 16, 2023

Do Similar Entities have Similar Embeddings?

arXiv:2312.10370v213 citationsh-index: 13Has CodeESWC
Originality Incremental advance
AI Analysis

This work addresses a critical gap for researchers and practitioners using KGEMs in downstream applications like recommender systems, revealing that standard link prediction metrics may not reflect entity similarity preservation.

The paper challenges the assumption that knowledge graph embedding models (KGEMs) inherently preserve entity similarity from the graph structure in their embedding space, and through extensive experiments, it finds that KGEMs often fail to cluster similar entities effectively, with performance varying across models and datasets.

Knowledge graph embedding models (KGEMs) developed for link prediction learn vector representations for entities in a knowledge graph, known as embeddings. A common tacit assumption is the KGE entity similarity assumption, which states that these KGEMs retain the graph's structure within their embedding space, \textit{i.e.}, position similar entities within the graph close to one another. This desirable property make KGEMs widely used in downstream tasks such as recommender systems or drug repurposing. Yet, the relation of entity similarity and similarity in the embedding space has rarely been formally evaluated. Typically, KGEMs are assessed based on their sole link prediction capabilities, using ranked-based metrics such as Hits@K or Mean Rank. This paper challenges the prevailing assumption that entity similarity in the graph is inherently mirrored in the embedding space. Therefore, we conduct extensive experiments to measure the capability of KGEMs to cluster similar entities together, and investigate the nature of the underlying factors. Moreover, we study if different KGEMs expose a different notion of similarity. Datasets, pre-trained embeddings and code are available at: https://github.com/nicolas-hbt/similar-embeddings/.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes