CLAIDec 30, 2025

Efficient Context Scaling with LongCat ZigZag Attention

arXiv:2512.23966v25 citationsh-index: 7
AI Analysis

This addresses the challenge of efficient long-context processing for applications like retrieval-augmented generation and tool-integrated reasoning, representing an incremental improvement in sparse attention methods.

The paper tackles the problem of scaling context length in attention-based models by introducing LongCat ZigZag Attention (LoZA), a sparse attention scheme that transforms full-attention models into sparse versions with limited compute, achieving significant speed-ups in long-context scenarios and enabling processing of up to 1 million tokens.

We introduce LongCat ZigZag Attention (LoZA), which is a sparse attention scheme designed to transform any existing full-attention models into sparse versions with rather limited compute budget. In long-context scenarios, LoZA can achieve significant speed-ups both for prefill-intensive (e.g., retrieval-augmented generation) and decode-intensive (e.g., tool-integrated reasoning) cases. Specifically, by applying LoZA to LongCat-Flash during mid-training, we serve LongCat-Flash-Exp as a long-context foundation model that can swiftly process up to 1 million tokens, enabling efficient long-term reasoning and long-horizon agentic capabilities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes