Information-Theoretic Quality Metric of Low-Dimensional Embeddings
This work addresses the need for better quality assessment of embeddings in information-sensitive applications like early-warning indicators, though it appears incremental as it builds on existing metrics.
The authors tackled the problem of evaluating low-dimensional embeddings by introducing the Entropy Rank Preservation Measure (ERPM), an information-theoretic metric based on Shannon entropy and stable rank, which quantifies information loss during projection. They validated ERPM by comparing it with distance-based and geometric metrics on financial time series and manifold data, finding that ERPM complements existing methods by identifying neighborhoods with severe information loss.
In this work we study the quality of low-dimensional embeddings from an explicitly information-theoretic perspective. We begin by noting that classical evaluation metrics such as stress, rank-based neighborhood criteria, or Local Procrustes quantify distortions in distances or in local geometries, but do not directly assess how much information is preserved when projecting high-dimensional data onto a lower-dimensional space. To address this limitation, we introduce the Entropy Rank Preservation Measure (ERPM), a local metric based on the Shannon entropy of the singular-value spectrum of neighborhood matrices and on the stable rank, which quantifies changes in uncertainty between the original representation and its reduced projection, providing neighborhood-level indicators and a global summary statistic. To validate the results of the metric, we compare its outcomes with the Mean Relative Rank Error (MRRE), which is distance-based, and with Local Procrustes, which is based on geometric properties, using a financial time series and a manifold commonly studied in the literature. We observe that distance-based criteria exhibit very low correlation with geometric and spectral measures, while ERPM and Local Procrustes show strong average correlation but display significant discrepancies in local regimes, leading to the conclusion that ERPM complements existing metrics by identifying neighborhoods with severe information loss, thereby enabling a more comprehensive assessment of embeddings, particularly in information-sensitive applications such as the construction of early-warning indicators.