CLAILGMay 23, 2023

Aligning Large Language Models through Synthetic Feedback

arXiv:2305.13735v2173 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the challenge of aligning LLMs for AI safety and usability, offering a more accessible method, though it is incremental as it builds on existing alignment techniques.

The paper tackles the problem of aligning large language models to human values without relying on extensive human annotations or proprietary models, by proposing a framework using synthetic feedback, and results in a model that outperforms recent open-sourced models in alignment benchmarks and human evaluations.

Aligning large language models (LLMs) to human values has become increasingly important as it enables sophisticated steering of LLMs. However, it requires significant human demonstrations and feedback or distillation from proprietary LLMs such as ChatGPT. In this work, we propose a novel alignment learning framework with synthetic feedback not dependent on extensive human annotations and proprietary LLMs. First, we perform reward modeling (RM) with synthetic feedback by contrasting responses from vanilla LLMs with various sizes and prompts. Then, we use the RM to simulate high-quality demonstrations to train a supervised policy and further optimize the model with reinforcement learning. Our resulting model, Aligned Language Model with Synthetic Training dataset (ALMoST), outperforms recent open-sourced models, which are trained on the outputs of InstructGPT or human-annotated demonstrations, in alignment benchmarks. In human evaluation, our model is preferred to Alpaca and Dolly-v2, 55.0% and 58.5% of the time, respectively. Further analyses demonstrate the efficacy and importance of synthetic feedback in our framework. The code is available at https://github.com/naver-ai/almost

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes