CLLGDec 4, 2023

Distilled Self-Critique of LLMs with Synthetic Data: a Bayesian Perspective

arXiv:2312.01957v36 citationsh-index: 10Has CodeTiny Papers @ ICLR
Originality Incremental advance
AI Analysis

This work addresses the challenge of aligning LLMs for improved safety and control, offering a potentially cheaper method, though it appears incremental as it builds on existing RLAIF and distillation techniques.

The paper tackled the problem of aligning large language models (LLMs) by proposing distilled Self-Critique (dSC), a method that refines LLM outputs using a Gibbs sampler and distills it into a fine-tuned model with only synthetic data, showing it can be a viable and cheap alternative in experiments on safety, sentiment, and privacy control.

This paper proposes an interpretation of RLAIF as Bayesian inference by introducing distilled Self-Critique (dSC), which refines the outputs of a LLM through a Gibbs sampler that is later distilled into a fine-tuned model. Only requiring synthetic data, dSC is exercised in experiments regarding safety, sentiment, and privacy control, showing it can be a viable and cheap alternative to align LLMs. Code released at \url{https://github.com/vicgalle/distilled-self-critique}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes