CLApr 23, 2024

Does Instruction Tuning Make LLMs More Consistent?

arXiv:2404.15206v32 citationsh-index: 7
Originality Incremental advance
AI Analysis

This addresses the problem of model reliability for users of large language models, though it is incremental as it builds on known effects of instruction tuning.

The study investigated whether instruction tuning improves the consistency of language models by making them less sensitive to small input perturbations, finding that instruction-tuned LLaMA models generally became more consistent in representations and predictions compared to the original model.

The purpose of instruction tuning is enabling zero-shot performance, but instruction tuning has also been shown to improve chain-of-thought reasoning and value alignment (Si et al., 2023). Here we consider the impact on $\textit{consistency}$, i.e., the sensitivity of language models to small perturbations in the input. We compare 10 instruction-tuned LLaMA models to the original LLaMA-7b model and show that almost across-the-board they become more consistent, both in terms of their representations and their predictions in zero-shot and downstream tasks. We explain these improvements through mechanistic analyses of factual recall.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes