CL AI LGNov 16, 2023

HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM

Zhilin Wang, Yi Dong, Jiaqi Zeng, Virginia Adams, Makesh Narsimhan Sreedhar, Daniel Egert, Olivier Delalleau, Jane Polak Scowcroft, Neel Kant, Aidan Swope, Oleksii Kuchaiev

NVIDIA

arXiv:2311.09528v123.3139 citationsh-index: 29Has Code

Originality Incremental advance

AI Analysis

This addresses the issue of ambiguous helpfulness annotations in AI training datasets, enabling more precise model steering for researchers and developers, though it is incremental as it builds on existing dataset and technique improvements.

The authors tackled the problem of existing helpfulness preference datasets lacking specificity, which leads models to learn artifacts like preferring longer responses. They introduced HelpSteer, a multi-attribute dataset with annotations for correctness, coherence, complexity, verbosity, and overall helpfulness, and training Llama 2 70B with it using SteerLM achieved a 7.54 score on MT Bench, the highest for open models without data from more powerful models.

Existing open-source helpfulness preference datasets do not specify what makes some responses more helpful and others less so. Models trained on these datasets can incidentally learn to model dataset artifacts (e.g. preferring longer but unhelpful responses only due to their length). To alleviate this problem, we collect HelpSteer, a multi-attribute helpfulness dataset annotated for the various aspects that make responses helpful. Specifically, our 37k-sample dataset has annotations for correctness, coherence, complexity, and verbosity in addition to overall helpfulness of responses. Training Llama 2 70B using the HelpSteer dataset with SteerLM technique produces a model that scores 7.54 on MT Bench, which is currently the highest score for open models that do not require training data from more powerful models (e.g. GPT4). We release this dataset with CC-BY-4.0 license at https://huggingface.co/datasets/nvidia/HelpSteer

View on arXiv PDF

Similar