MAAIMay 9

Slipstream: Trajectory-Grounded Compaction Validation for Long-Horizon Agents

arXiv:2605.0858091.1
Predicted impact top 6% in MA · last 90 daysOriginality Incremental advance
AI Analysis

For developers of long-horizon LLM agents, Slipstream addresses the validation gap in compaction that silently degrades accuracy, offering a practical solution with measurable gains.

Slipstream introduces asynchronous trajectory compaction for long-horizon LLM agents, validating summaries against the agent's continued reasoning to avoid error propagation. It improves task accuracy by up to 8.8 percentage points and reduces latency by up to 39.7% on SWE-bench Verified and BrowseComp.

To cope with the large contexts that long-horizon LLM agents produce, modern frameworks increasingly rely on compaction -- invoking an LLM to rewrite the accumulated trajectory into a shorter summary that the agent resumes from. Today, compaction runs synchronously on the critical path of agent execution but this can unpredictably degrade accuracy due to a structural validation gap: the compactor must condense context but is fundamentally unaware of precisely what information the agent will need later. Further, because post-compaction agent steps are conditioned on the new summary, targeted validation criteria do not exist and errors silently propagate through coherent but incorrect behavior. Our key insight is that asynchronous compaction efficiently addresses this gap: by running the compactor in parallel with continued agent execution on the original context, the candidate summary and the agent's next steps are generated independently from the same pre-compaction state, yielding a validation signal independent of the summary itself. We build Slipstream, a trajectory-grounded compaction system that uses a judge to validate the candidate summary against the agent's continued reasoning, checking that it preserves both the agent's forward intent and the key facts and constraints it depends on. Across long-horizon coding (SWE-bench Verified) and web-browsing (BrowseComp) workloads, Slipstream improves task accuracy by up to 8.8 percentage points while reducing end-to-end latency by up to 39.7%.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes