CLAIATCDNov 16, 2023

A Language and Its Dimensions: Intrinsic Dimensions of Language Fractal Structures

arXiv:2311.10217v211 citationsh-index: 3
Originality Incremental advance
AI Analysis

This work addresses a foundational problem in linguistics and AI by providing a novel mathematical framework for understanding language structure, though it is incremental as it builds on existing fractal and embedding concepts.

The paper tackles the problem of characterizing the fractal nature of language by introducing language fractal structures, estimating their intrinsic dimensions for Russian and English using topological data analysis and minimum spanning tree methods, and finding non-integer values close to 9 for both languages.

The present paper introduces a novel object of study - a language fractal structure. We hypothesize that a set of embeddings of all $n$-grams of a natural language constitutes a representative sample of this fractal set. (We use the term Hailonakea to refer to the sum total of all language fractal structures, over all $n$). The paper estimates intrinsic (genuine) dimensions of language fractal structures for the Russian and English languages. To this end, we employ methods based on (1) topological data analysis and (2) a minimum spanning tree of a data graph for a cloud of points considered (Steele theorem). For both languages, for all $n$, the intrinsic dimensions appear to be non-integer values (typical for fractal sets), close to 9 for both of the Russian and English language.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes