AIMay 30

TRACE: Trajectory Risk-Aware Compression for Long-Horizon Agent Safety

arXiv:2606.0061182.4h-index: 10Has Code
AI Analysis

For LLM safety researchers, TRACE addresses the problem of detecting sparse, delayed, and compositional risk signals in long trajectories, where existing methods fail.

TRACE reframes long-horizon LLM agent safety detection as trajectory-level evidence compression, achieving up to 12.6 percentage points improvement over strong baselines on ASSEBench, Pre-Ex-Bench, and R-Judge, and showing smaller performance degradation as context length grows on LongSafety.

Long-horizon LLM agents produce safety evidence across long trajectories, where sparse, delayed, and compositional risk signals often escape local moderation. Existing turn-level or short-context detectors struggle to reliably retain and aggregate such evidence over extended horizons. We reframe long-horizon agent safety detection as trajectory-level evidence compression and propose Trajectory Risk-Aware Compression for Long-Horizon Agent Safety (TRACE). TRACE uses a Compressor-Reader design: the Compressor encodes the full trajectory into a compact latent evidence state under trajectory-level supervision, and the Reader judges the raw trajectory with this latent evidence state as a safety reference. This design helps aggregate dispersed risk cues and reduce premature evidence loss. Across ASSEBench, Pre-Ex-Bench, and R-Judge, TRACE achieves the best accuracy on all evaluated backbones, improving over strong baselines by up to 12.6 percentage points. On LongSafety, TRACE shows smaller performance degradation as context length grows. Attention visualizations and case studies suggest that the compressed reference helps the Reader focus on risk-critical segments and recover cross-step evidence. Code is available at https://github.com/Peregrine123/TRACE_official.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes