CL AIDec 25, 2025

Context Discipline and Performance Correlation: Analyzing LLM Performance and Quality Degradation Under Varying Context Lengths

Ahilan Ayyachamy Nadar Ponnusamy, Karthic Chandran, M Maruf Hossain

arXiv:2601.11564v11 citationsh-index: 1

Originality Incremental advance

AI Analysis

It addresses computational overhead and quality degradation in LLMs for researchers and practitioners scaling context windows, but is incremental as it builds on existing transformer and MoE architectures.

This paper investigates the trade-off between system performance and model quality in large language models (LLMs) when handling irrelevant context, finding non-linear performance degradation linked to Key-Value cache growth and identifying behavioral anomalies in Mixture-of-Experts architectures at high token volumes.

The scaling trend in Large Language Models (LLMs) has prioritized increasing the maximum context window to facilitate complex, long-form reasoning and document analysis. However, managing this expanded context introduces severe computational overhead. This paper investigates the critical trade-off between system performance and model quality when dense transformer architectures--specifically Llama-3.1-70B and Qwen1.5-14B--are exposed to large volumes of irrelevant and distracting context. The research identifies a non-linear performance degradation tied to the growth of the Key-Value (KV) cache. Furthermore, an extended analysis of the Mixture-of-Experts (MoE) architecture reveals unique behavioral anomalies at varying context scales, suggesting that architectural benefits may be masked by infrastructure bottlenecks at high token volumes.

View on arXiv PDF

Similar