CLAILGMay 27, 2025

Exploring the Hidden Capacity of LLMs for One-Step Text Generation

arXiv:2505.21189v24 citationsh-index: 5EMNLP
AI Analysis

This work addresses the fundamental bottleneck of slow autoregressive decoding in text generation for AI applications, offering a potential incremental improvement by reusing existing models.

The study tackled the problem of autoregressive decoding being a bottleneck for text generation by showing that frozen large language models can generate hundreds of accurate tokens in a single forward pass using only two learned embeddings, revealing a hidden multi-token generation capability. This result suggests that off-the-shelf models may natively support faster generation without retraining.

A recent study showed that large language models (LLMs) can reconstruct surprisingly long texts - up to thousands of tokens - via autoregressive generation from just one trained input embedding. In this work, we explore whether autoregressive decoding is essential for such reconstruction. We show that frozen LLMs can generate hundreds of accurate tokens in just one token-parallel forward pass, when provided with only two learned embeddings. This reveals a surprising and underexplored multi-token generation capability of autoregressive LLMs. We examine these embeddings and characterize the information they encode. We also empirically show that, although these representations are not unique for a given text, they form connected and local regions in embedding space - suggesting the potential to train a practical encoder. The existence of such representations hints that multi-token generation may be natively accessible in off-the-shelf LLMs via a learned input encoder, eliminating heavy retraining and helping to overcome the fundamental bottleneck of autoregressive decoding while reusing already-trained models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes