Minimax Rates for Hyperbolic Hierarchical Learning
This work addresses the problem of efficient learning on hierarchical data for machine learning practitioners, providing a theoretical foundation for using hyperbolic geometry to overcome limitations of Euclidean spaces.
The paper demonstrates that hyperbolic representations enable learning on hierarchical data with sample complexity scaling linearly in depth, whereas Euclidean representations require exponential sample complexity due to geometric obstructions. Specifically, hyperbolic embeddings achieve optimal sample complexity of O(mR log m) for depth-R hierarchies with branching factor m, while Euclidean embeddings necessitate exponential scaling.
We prove an exponential separation in sample complexity between Euclidean and hyperbolic representations for learning on hierarchical data under standard Lipschitz regularization. For depth-$R$ hierarchies with branching factor $m$, we first establish a geometric obstruction for Euclidean space: any bounded-radius embedding forces volumetric collapse, mapping exponentially many tree-distant points to nearby locations. This necessitates Lipschitz constants scaling as $\exp(Ω(R))$ to realize even simple hierarchical targets, yielding exponential sample complexity under capacity control. We then show this obstruction vanishes in hyperbolic space: constant-distortion hyperbolic embeddings admit $O(1)$-Lipschitz realizability, enabling learning with $n = O(mR \log m)$ samples. A matching $Ω(mR \log m)$ lower bound via Fano's inequality establishes that hyperbolic representations achieve the information-theoretic optimum. We also show a geometry-independent bottleneck: any rank-$k$ prediction space captures only $O(k)$ canonical hierarchical contrasts.