DG AIMar 19, 2025

Probing the topology of the space of tokens with structured prompts

Michael Robinson, Sourya Dey, Taisa Kushner

arXiv:2503.15421v15.14 citationsh-index: 8Mathematics

Originality Highly original

AI Analysis

This work addresses the challenge of understanding the internal structure of LLMs for researchers in machine learning and AI, offering a foundational approach with broad applicability.

The authors tackled the problem of revealing the hidden token input embeddings of large language models up to homeomorphism, and they provided a mathematical proof for the method's effectiveness, demonstrating it by recovering the token subspace of Llemma-7B.

This article presents a general and flexible method for prompting a large language model (LLM) to reveal its (hidden) token input embedding up to homeomorphism. Moreover, this article provides strong theoretical justification -- a mathematical proof for generic LLMs -- for why this method should be expected to work. With this method in hand, we demonstrate its effectiveness by recovering the token subspace of Llemma-7B. The results of this paper apply not only to LLMs but also to general nonlinear autoregressive processes.

View on arXiv PDF

Similar