CLAIMay 17, 2023

SLiC-HF: Sequence Likelihood Calibration with Human Feedback

arXiv:2305.10425v1408 citations
Originality Incremental advance
AI Analysis

This provides a simpler and more efficient method for aligning language models with human preferences, which is incremental as it adapts an existing calibration technique to human feedback.

The paper tackles aligning language models with human preferences by introducing SLiC-HF, which uses Sequence Likelihood Calibration with human feedback, showing it significantly improves supervised fine-tuning baselines on the TL;DR summarization task and offers a competitive alternative to PPO RLHF with simpler implementation and better efficiency.

Learning from human feedback has been shown to be effective at aligning language models with human preferences. Past work has often relied on Reinforcement Learning from Human Feedback (RLHF), which optimizes the language model using reward scores assigned from a reward model trained on human preference data. In this work we show how the recently introduced Sequence Likelihood Calibration (SLiC), can also be used to effectively learn from human preferences (SLiC-HF). Furthermore, we demonstrate this can be done with human feedback data collected for a different model, similar to off-policy, offline RL data. Automatic and human evaluation experiments on the TL;DR summarization task show that SLiC-HF significantly improves supervised fine-tuning baselines. Furthermore, SLiC-HF presents a competitive alternative to the PPO RLHF implementation used in past work while being much simpler to implement, easier to tune and more computationally efficient in practice.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes