Emergent Response Planning in LLMs
This research tackles the problem of understanding how large language models generate text, which is significant for developers and users of language models seeking to improve transparency and generation control.
This work demonstrates that large language models exhibit emergent planning behaviors, with their hidden representations encoding future outputs beyond the next token, including structure, content, and behavior attributes. The study found that this planning behavior scales with model size and evolves during generation.
In this work, we argue that large language models (LLMs), though trained to predict only the next token, exhibit emergent planning behaviors: $\textbf{their hidden representations encode future outputs beyond the next token}$. Through simple probing, we demonstrate that LLM prompt representations encode global attributes of their entire responses, including $\textit{structure attributes}$ (e.g., response length, reasoning steps), $\textit{content attributes}$ (e.g., character choices in storywriting, multiple-choice answers at the end of response), and $\textit{behavior attributes}$ (e.g., answer confidence, factual consistency). In addition to identifying response planning, we explore how it scales with model size across tasks and how it evolves during generation. The findings that LLMs plan ahead for the future in their hidden representations suggest potential applications for improving transparency and generation control.