CLJul 31, 2024

Dancing in Chains: Reconciling Instruction Following and Faithfulness in Language Models

Stanford
arXiv:2407.21417v123 citationsh-index: 20
Originality Incremental advance
AI Analysis

This addresses a key challenge in aligning language models for reliable and controllable AI applications, though it is incremental in improving existing training approaches.

The paper identifies a trade-off between instruction following and faithfulness in language models, showing that optimizing for one degrades the other, and proposes ReSet, a method that outperforms multi-task learning with three-fold less data.

Modern language models (LMs) need to follow human instructions while being faithful; yet, they often fail to achieve both. Here, we provide concrete evidence of a trade-off between instruction following (i.e., follow open-ended instructions) and faithfulness (i.e., ground responses in given context) when training LMs with these objectives. For instance, fine-tuning LLaMA-7B on instruction following datasets renders it less faithful. Conversely, instruction-tuned Vicuna-7B shows degraded performance at following instructions when further optimized on tasks that require contextual grounding. One common remedy is multi-task learning (MTL) with data mixing, yet it remains far from achieving a synergic outcome. We propose a simple yet effective method that relies on Rejection Sampling for Continued Self-instruction Tuning (ReSet), which significantly outperforms vanilla MTL. Surprisingly, we find that less is more, as training ReSet with high-quality, yet substantially smaller data (three-fold less) yields superior results. Our findings offer a better understanding of objective discrepancies in alignment training of LMs.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes