CLMay 30

ProactiveLLM: Learning Active Interaction for Streaming Large Language Models

Junlong Tong, Yao Zhang, Anhao Zhao, Yingqi Fan, Yunpu Ma, Xiaoyu Shen

arXiv:2606.0052356.5h-index: 6Has Code

AI Analysis

For streaming LLM applications, this work addresses the bottleneck of inefficient interaction timing without requiring external alignment signals.

ProactiveLLM enables streaming LLMs to actively decide when to interact with input streams by using endogenous model states, reducing latency while maintaining quality across text and speech tasks.

Standard Large Language Models (LLMs) follow a read-then-generate paradigm, causing unnecessary latency and computation. Streaming LLMs alleviate this issue by generating while receiving inputs, but still struggle to decide when to interact with the stream. Existing methods either hard-code interaction timing or rely on costly external alignment signals, such as timing labels, reasoning trajectories, or stronger teachers. In this paper, we propose ProactiveLLM, which achieves active interaction by leveraging the model's endogenous states to guide interaction decisions. The model first learns to perceive semantic sufficiency from partial inputs through two complementary training mechanisms: mask-based streaming modeling and synchronized privileged self-distillation (SPSD). The former applies monotonic random masking to the input during training, simulating progressively revealed streaming inputs and enabling the model to learn local semantic dependencies from partial-input views. The latter aligns the partial-context student view with a full-context teacher view generated by the same evolving model, allowing privileged full-context evidence to guide the student's understanding under incomplete observations. Together, these mechanisms induce endogenous sufficiency cues without requiring external teachers or annotations, providing a versatile foundation for the plug-and-play integration of diverse decision heads. Extensive evaluation across text and speech streaming tasks confirms that ProactiveLLM significantly reduces interaction latency while maintaining quality, validating its capacity for dynamic and active interaction. Code is publicly available at https://github.com/EIT-NLP/StreamingLLM/tree/main/ProactiveLLM.

View on arXiv PDF Code

Similar