CL AIDec 30, 2025

Efficient Context Scaling with LongCat ZigZag Attention

Chen Zhang, Yang Bai, Jiahuan Li, Anchun Gui, Keheng Wang, Feifan Liu, Guanyu Wu, Yuwei Jiang, Defei Bu, Li Wei, Haihang Jing, Hongyin Tang

arXiv:2512.23966v26.75 citationsh-index: 7

Originality Incremental advance

AI Analysis

This addresses the challenge of efficient long-context processing for applications like retrieval-augmented generation and tool-integrated reasoning, representing an incremental improvement in sparse attention methods.

The paper tackles the problem of scaling context length in attention-based models by introducing LongCat ZigZag Attention (LoZA), a sparse attention scheme that transforms full-attention models into sparse versions with limited compute, achieving significant speed-ups in long-context scenarios and enabling processing of up to 1 million tokens.

We introduce LongCat ZigZag Attention (LoZA), which is a sparse attention scheme designed to transform any existing full-attention models into sparse versions with rather limited compute budget. In long-context scenarios, LoZA can achieve significant speed-ups both for prefill-intensive (e.g., retrieval-augmented generation) and decode-intensive (e.g., tool-integrated reasoning) cases. Specifically, by applying LoZA to LongCat-Flash during mid-training, we serve LongCat-Flash-Exp as a long-context foundation model that can swiftly process up to 1 million tokens, enabling efficient long-term reasoning and long-horizon agentic capabilities.

View on arXiv PDF

Similar