CLApr 9

SepSeq: A Training-Free Framework for Long Numerical Sequence Processing in LLMs

Jie Sun, Yu Liu, Lu Han, Qiwen Deng, Xiang Shu, Yang Xiao, Xingyu Lu, Jun Zhou, Pengfei Liu, Lintao Ma, Jiancan Wu, Xiang Wang

arXiv:2604.0773793.3

Predicted impact top 19% in CL · last 90 daysOriginality Incremental advance

AI Analysis

This addresses a specific bottleneck for users of LLMs in domains requiring long numerical sequence processing, offering a training-free solution.

The paper tackles the problem of performance degradation in transformer-based Large Language Models (LLMs) when processing long numerical sequences, achieving an average relative accuracy improvement of 35.6% and reducing total inference token consumption by 16.4%.

While transformer-based Large Language Models (LLMs) theoretically support massive context windows, they suffer from severe performance degradation when processing long numerical sequences. We attribute this failure to the attention dispersion in the Softmax mechanism, which prevents the model from concentrating attention. To overcome this, we propose Separate Sequence (SepSeq), a training-free, plug-and-play framework to mitigate dispersion by strategically inserting separator tokens. Mechanistically, we demonstrate that separator tokens act as an attention sink, recalibrating attention to focus on local segments while preserving global context. Extensive evaluations on 9 widely-adopted LLMs confirm the effectiveness of our approach: SepSeq yields an average relative accuracy improvement of 35.6% across diverse domains while reducing total inference token consumption by 16.4% on average.

View on arXiv PDF

Similar