CLDec 29, 2025

Geometric Patterns of Meaning: A PHATE Manifold Analysis of Multi-lingual Embeddings

arXiv:2601.09731v1h-index: 1
Originality Synthesis-oriented
AI Analysis

This work addresses the need for better tools to validate embedding models in capturing semantic relationships across languages, though it is incremental in applying existing manifold learning techniques to new linguistic data.

The paper tackled the problem of analyzing semantic geometry in multilingual embeddings by introducing a multi-level framework using PHATE manifold learning, revealing systematic patterns and critical limitations such as geometric collapse at sub-character levels and distinct signatures across writing systems.

We introduce a multi-level analysis framework for examining semantic geometry in multilingual embeddings, implemented through Semanscope (a visualization tool that applies PHATE manifold learning across four linguistic levels). Analysis of diverse datasets spanning sub-character components, alphabetic systems, semantic domains, and numerical concepts reveals systematic geometric patterns and critical limitations in current embedding models. At the sub-character level, purely structural elements (Chinese radicals) exhibit geometric collapse, highlighting model failures to distinguish semantic from structural components. At the character level, different writing systems show distinct geometric signatures. At the word level, content words form clustering-branching patterns across 20 semantic domains in English, Chinese, and German. Arabic numbers organize through spiral trajectories rather than clustering, violating standard distributional semantics assumptions. These findings establish PHATE manifold learning as an essential analytic tool not only for studying geometric structure of meaning in embedding space, but also for validating the effectiveness of embedding models in capturing semantic relationships.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes