Revisiting Parallel Context Windows: A Frustratingly Simple Alternative and Chain-of-Thought Deterioration
This work highlights critical evaluation gaps for a method aimed at improving long-context understanding in language models, which is important for researchers and practitioners dealing with lengthy documents, but it is incremental as it critiques rather than proposes a new solution.
The paper identifies two limitations in the Parallel Context Windows (PCW) method for extending language model context lengths: a missing simple baseline for few-shot classification and unexpected deterioration in Chain-of-Thought reasoning tasks like HotpotQA, suggesting PCW may not guarantee sufficient improvement for real-world lengthy document handling.
We identify two crucial limitations in the evaluation of recent parallel-integrated method Parallel Context Windows (PCW), which extends the maximum context lengths of language models, e.g., 2048 for LLaMA, by harnessing window-wise attention and positional embedding techniques. We first show that a simple yet strong baseline, weighted sum ensemble, is missing for the in-context few-shot classification. Moreover, on more challenging Chain-of-Thought (CoT) reasoning (e.g., HotpotQA), PCW would present unexpected deterioration regarding question miscomprehension and false inference. Based on our findings, we suggest that the existing PCW design may not guarantee sufficient improvement and practicality in handling lengthy documents in real-world applications. More community efforts on enabling language models' long context understanding ability should be paid.