LGCLDec 11, 2025

Asynchronous Reasoning: Training-Free Interactive Thinking LLMs

arXiv:2512.10931v24 citations
Originality Incremental advance
AI Analysis

This addresses the need for responsive LLMs in voice-based or embodied assistants, though it is an incremental improvement on existing reasoning methods.

The paper tackles the problem of making reasoning LLMs interactive for real-time applications by enabling them to think, listen, and write outputs asynchronously without training, reducing time to first non-thinking token to ≤5 seconds and cutting real-time delays by up to 12×.

Many state-of-the-art LLMs are trained to think before giving their answer. Reasoning can greatly improve language model capabilities, but it also makes them less interactive: given a new input, a model must stop thinking before it can respond. Real-world use cases such as voice-based or embodied assistants require an LLM agent to respond and adapt to additional information in real time, which is incompatible with sequential interactions. In contrast, humans can listen, think, and act asynchronously: we begin thinking about the problem while reading it and continue thinking while formulating the answer. In this work, we augment LLMs capable of reasoning to operate in a similar way without additional training. Our method uses the properties of positional embeddings to enable LLMs built for sequential generation to simultaneously think, listen, and write outputs. We evaluate our approach on math, commonsense, and safety reasoning: it allows models to generate accurate thinking-augmented answers while reducing time to first non-thinking token from minutes to ${\le}$ 5s and the overall real-time delays by up to $12{\times}$.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes