CL AIAug 8, 2023

Shepherd: A Critic for Language Model Generation

Tianlu Wang, Ping Yu, Xiaoqing Ellen Tan, Sean O'Brien, Ramakanth Pasunuru, Jane Dwivedi-Yu, Olga Golovneva, Luke Zettlemoyer, Maryam Fazel-Zarandi, Asli Celikyilmaz

BerkeleyMeta AIMicrosoftU of TorontoUW

arXiv:2308.04592v118.0113 citationsh-index: 116Has Code

Originality Incremental advance

AI Analysis

This provides a method for enhancing language model generation, though it is incremental as it builds on existing tuning and feedback techniques.

The paper tackles the problem of refining language model outputs by introducing Shepherd, a 7B-parameter model tuned to critique and suggest improvements, achieving a 53-87% win-rate over alternatives in GPT-4 evaluation and closely tying with ChatGPT in human evaluations.

As large language models improve, there is increasing interest in techniques that leverage these models' capabilities to refine their own outputs. In this work, we introduce Shepherd, a language model specifically tuned to critique responses and suggest refinements, extending beyond the capabilities of an untuned model to identify diverse errors and provide suggestions to remedy them. At the core of our approach is a high quality feedback dataset, which we curate from community feedback and human annotations. Even though Shepherd is small (7B parameters), its critiques are either equivalent or preferred to those from established models including ChatGPT. Using GPT-4 for evaluation, Shepherd reaches an average win-rate of 53-87% compared to competitive alternatives. In human evaluation, Shepherd strictly outperforms other models and on average closely ties with ChatGPT.

View on arXiv PDF Code

Similar