CLAIHCNov 2, 2023

The Impact of Preference Agreement in Reinforcement Learning from Human Feedback: A Case Study in Summarization

arXiv:2311.04919v13 citationsh-index: 8
Originality Incremental advance
AI Analysis

This work addresses the challenge of designing effective RLHF systems for summarization by highlighting the importance of preference agreement, offering insights for synthetic dataset creation and comparison-based data quality, though it is incremental in nature.

The study investigated how varying levels of human annotator agreement affect Reinforcement Learning from Human Feedback (RLHF) for text summarization, finding that including a range of agreements leads to higher accuracy reward models and alters captured quality characteristics, with improvements in downstream generation.

Reinforcement Learning from Human Feedback (RLHF) can be used to capture complex and nuanced properties of text generation quality. As a result, the task of text summarization has been identified as a good candidate for this process. In this paper, we explore how preference agreement impacts the efficacy of RLHF for summarization. We show that sampling human preferences to include a range of annotator agreement results in (1) higher accuracy reward models and (2) alters the characteristics of quality captured. We additionally show improvements in downstream generation when using a reward model trained with a range of preference agreements. Our contributions have implications for the design of synthetic datasets as well as the importance of considering quality differentials in comparison-based data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes