Probing the topology of the space of tokens with structured prompts
This work addresses the challenge of understanding the internal structure of LLMs for researchers in machine learning and AI, offering a foundational approach with broad applicability.
The authors tackled the problem of revealing the hidden token input embeddings of large language models up to homeomorphism, and they provided a mathematical proof for the method's effectiveness, demonstrating it by recovering the token subspace of Llemma-7B.
This article presents a general and flexible method for prompting a large language model (LLM) to reveal its (hidden) token input embedding up to homeomorphism. Moreover, this article provides strong theoretical justification -- a mathematical proof for generic LLMs -- for why this method should be expected to work. With this method in hand, we demonstrate its effectiveness by recovering the token subspace of Llemma-7B. The results of this paper apply not only to LLMs but also to general nonlinear autoregressive processes.