CLNov 8, 2024

Reducing Distraction in Long-Context Language Models by Focused Learning

arXiv:2411.05928v19 citationsh-index: 3
Originality Incremental advance
AI Analysis

This addresses a key bottleneck for users of long-context LLMs, though it is incremental as it builds on existing fine-tuning and contrastive learning techniques.

The paper tackles the problem of distraction in long-context language models, where irrelevant information reduces focus on relevant segments, by proposing a novel training method that improves performance on long single-document and multi-document QA benchmarks.

Recent advancements in Large Language Models (LLMs) have significantly enhanced their capacity to process long contexts. However, effectively utilizing this long context remains a challenge due to the issue of distraction, where irrelevant information dominates lengthy contexts, causing LLMs to lose focus on the most relevant segments. To address this, we propose a novel training method that enhances LLMs' ability to discern relevant information through a unique combination of retrieval-based data augmentation and contrastive learning. Specifically, during fine-tuning with long contexts, we employ a retriever to extract the most relevant segments, serving as augmented inputs. We then introduce an auxiliary contrastive learning objective to explicitly ensure that outputs from the original context and the retrieved sub-context are closely aligned. Extensive experiments on long single-document and multi-document QA benchmarks demonstrate the effectiveness of our proposed method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes