CLFeb 17, 2025

From the New World of Word Embeddings: A Comparative Study of Small-World Lexico-Semantic Networks in LLMs

arXiv:2502.11380v22 citationsh-index: 4
Originality Synthesis-oriented
AI Analysis

This work addresses the gap in understanding semantic structures in LLMs for researchers in natural language processing, but it is incremental as it extends existing network analysis methods to a new model type.

The study tackled the problem of constructing lexico-semantic networks from decoder-only large language models (LLMs), which are underexplored compared to encoder-based models, and found that these networks exhibit small-world properties with high clustering and short path lengths, where larger LLMs produce more intricate networks with less small-world effects and longer paths.

Lexico-semantic networks represent words as nodes and their semantic relatedness as edges. While such networks are traditionally constructed using embeddings from encoder-based models or static vectors, embeddings from decoder-only large language models (LLMs) remain underexplored. Unlike encoder models, LLMs are trained with a next-token prediction objective, which does not directly encode the meaning of the current token. In this paper, we construct lexico-semantic networks from the input embeddings of LLMs with varying parameter scales and conduct a comparative analysis of their global and local structures. Our results show that these networks exhibit small-world properties, characterized by high clustering and short path lengths. Moreover, larger LLMs yield more intricate networks with less small-world effects and longer paths, reflecting richer semantic structures and relations. We further validate our approach through analyses of common conceptual pairs, structured lexical relations derived from WordNet, and a cross-lingual semantic network for qualitative words.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes