CLJul 31, 2024

Dancing in Chains: Reconciling Instruction Following and Faithfulness in Language Models

Zhengxuan Wu, Yuhao Zhang, Peng Qi, Yumo Xu, Rujun Han, Yian Zhang, Jifan Chen, Bonan Min, Zhiheng Huang

Stanford

arXiv:2407.21417v112.923 citationsh-index: 20Has Code

Originality Incremental advance

AI Analysis

This addresses a key challenge in aligning language models for reliable and controllable AI applications, though it is incremental in improving existing training approaches.

The paper identifies a trade-off between instruction following and faithfulness in language models, showing that optimizing for one degrades the other, and proposes ReSet, a method that outperforms multi-task learning with three-fold less data.

Modern language models (LMs) need to follow human instructions while being faithful; yet, they often fail to achieve both. Here, we provide concrete evidence of a trade-off between instruction following (i.e., follow open-ended instructions) and faithfulness (i.e., ground responses in given context) when training LMs with these objectives. For instance, fine-tuning LLaMA-7B on instruction following datasets renders it less faithful. Conversely, instruction-tuned Vicuna-7B shows degraded performance at following instructions when further optimized on tasks that require contextual grounding. One common remedy is multi-task learning (MTL) with data mixing, yet it remains far from achieving a synergic outcome. We propose a simple yet effective method that relies on Rejection Sampling for Continued Self-instruction Tuning (ReSet), which significantly outperforms vanilla MTL. Surprisingly, we find that less is more, as training ReSet with high-quality, yet substantially smaller data (three-fold less) yields superior results. Our findings offer a better understanding of objective discrepancies in alignment training of LMs.

View on arXiv PDF Code

Similar