CLAINov 12, 2024

Large Language Models Can Self-Improve in Long-context Reasoning

arXiv:2411.08147v127 citationsh-index: 11
Originality Incremental advance
AI Analysis

This addresses the bottleneck of long-context reasoning in LLMs, which is essential for their advancement, though it is incremental as it builds on existing self-improvement techniques.

The paper tackles the problem of large language models struggling with long-context reasoning by proposing a self-improvement approach that uses sampled outputs and Minimum Bayes Risk scoring for fine-tuning, resulting in an absolute improvement of 4.2 points for Llama-3.1-8B-Instruct.

Large language models (LLMs) have achieved substantial progress in processing long contexts but still struggle with long-context reasoning. Existing approaches typically involve fine-tuning LLMs with synthetic data, which depends on annotations from human experts or advanced models like GPT-4, thus restricting further advancements. To address this issue, we investigate the potential for LLMs to self-improve in long-context reasoning and propose \ours, an approach specifically designed for this purpose. This approach is straightforward: we sample multiple outputs for each question, score them with Minimum Bayes Risk, and then apply supervised fine-tuning or preference optimization based on these outputs. Extensive experiments on several leading LLMs demonstrate the effectiveness of \ours, with an absolute improvement of $4.2$ points for Llama-3.1-8B-Instruct. Furthermore, \ours achieves superior performance compared to prior approaches that depend on data produced by human experts or advanced models. We anticipate that this work will open new avenues for self-improvement techniques in long-context scenarios, which are essential for the continual advancement of LLMs.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes