CLNov 2, 2022

Hierarchies over Vector Space: Orienting Word and Graph Embeddings

arXiv:2211.01430v20.61 citationsh-index: 57

Originality Incremental advance

AI Analysis

This work addresses the need for hierarchical structure in embeddings for tasks like relation discovery and link recovery, but it is incremental as it builds on existing embedding methods with a novel tree construction algorithm.

The paper tackles the problem of capturing hierarchical properties from unordered flat embeddings like word and graph embeddings by constructing a directed rooted tree based on entity power, achieving average accuracies of 8.98% for hypernym discovery, 2.70% for LCA discovery across five languages, and 62.76% for Wikipedia link recovery, all substantially above baselines.

Word and graph embeddings are widely used in deep learning applications. We present a data structure that captures inherent hierarchical properties from an unordered flat embedding space, particularly a sense of direction between pairs of entities. Inspired by the notion of \textit{distributional generality}, our algorithm constructs an arborescence (a directed rooted tree) by inserting nodes in descending order of entity power (e.g., word frequency), pointing each entity to the closest more powerful node as its parent. We evaluate the performance of the resulting tree structures on three tasks: hypernym relation discovery, least-common-ancestor (LCA) discovery among words, and Wikipedia page link recovery. We achieve average 8.98\% and 2.70\% for hypernym and LCA discovery across five languages and 62.76\% accuracy on directed Wiki-page link recovery, with both substantially above baselines. Finally, we investigate the effect of insertion order, the power/similarity trade-off and various power sources to optimize parent selection.

View on arXiv PDF

Similar