Following Length Constraints in Instructions
This addresses the issue of length bias in evaluating and training instruction-following models, which is an incremental improvement for AI alignment and user interaction.
The paper tackled the problem of length bias in instruction-following models by training models that can be controlled with length constraints in instructions, resulting in superior performance in length-instructed evaluations compared to models like GPT4, Llama 3, and Mixtral.
Aligned instruction following models can better fulfill user requests than their unaligned counterparts. However, it has been shown that there is a length bias in evaluation of such models, and that training algorithms tend to exploit this bias by learning longer responses. In this work we show how to train models that can be controlled at inference time with instructions containing desired length constraints. Such models are superior in length instructed evaluations, outperforming standard instruction following models such as GPT4, Llama 3 and Mixtral.