CL LGDec 4, 2023

Distilled Self-Critique of LLMs with Synthetic Data: a Bayesian Perspective

arXiv:2312.01957v32.16 citationsh-index: 2Has CodeTiny Papers @ ICLR

Originality Incremental advance

AI Analysis

This work addresses the challenge of aligning LLMs for improved safety and control, offering a potentially cheaper method, though it appears incremental as it builds on existing RLAIF and distillation techniques.

The paper tackled the problem of aligning large language models (LLMs) by proposing distilled Self-Critique (dSC), a method that refines LLM outputs using a Gibbs sampler and distills it into a fine-tuned model with only synthetic data, showing it can be a viable and cheap alternative in experiments on safety, sentiment, and privacy control.

This paper proposes an interpretation of RLAIF as Bayesian inference by introducing distilled Self-Critique (dSC), which refines the outputs of a LLM through a Gibbs sampler that is later distilled into a fine-tuned model. Only requiring synthetic data, dSC is exercised in experiments regarding safety, sentiment, and privacy control, showing it can be a viable and cheap alternative to align LLMs. Code released at \url{https://github.com/vicgalle/distilled-self-critique}.

View on arXiv PDF Code

Similar