Neighborhood Growth Determines Geometric Priors for Relational Representation Learning
This addresses the challenge of selecting appropriate geometric priors for representation learning in heterogeneous graphs, which is incremental as it builds on classical embeddability methods.
The paper tackles the problem of identifying suitable geometric spaces (Euclidean, Hyperbolic, or Spherical) for embedding heterogeneous relational data by analyzing nearest-neighbor structures and local neighborhood growth rates, and validates the method on benchmark datasets with comparisons to optimization-based approaches.
The problem of identifying geometric structure in heterogeneous, high-dimensional data is a cornerstone of representation learning. While there exists a large body of literature on the embeddability of canonical graphs, such as lattices or trees, the heterogeneity of the relational data typically encountered in practice limits the applicability of these classical methods. In this paper, we propose a combinatorial approach to evaluating embeddability, i.e., to decide whether a data set is best represented in Euclidean, Hyperbolic or Spherical space. Our method analyzes nearest-neighbor structures and local neighborhood growth rates to identify the geometric priors of suitable embedding spaces. For canonical graphs, the algorithm's prediction provably matches classical results. As for large, heterogeneous graphs, we introduce an efficiently computable statistic that approximates the algorithm's decision rule. We validate our method over a range of benchmark data sets and compare with recently published optimization-based embeddability methods.