LGAIDec 8, 2024

Classifier-free guidance in LLMs Safety

arXiv:2412.06846v11 citationsh-index: 1
Originality Incremental advance
AI Analysis

This addresses safety concerns in LLMs by enabling effective unlearning of harmful content, though it appears incremental as it builds on existing ORPO and classifier-free guidance methods.

The paper tackles the problem of unlearning harmful content from large language models without needing a retaining dataset, achieving significant improvement in unlearning without degrading model performance through a CFG-aware training regime with synthetic replacement data and modified classifier-free guidance during inference.

The paper describes LLM unlearning without a retaining dataset, using the ORPO reinforcement learning method with inference enhanced by modified classifier-free guidance. Significant improvement in unlearning, without degradation of the model, is achieved through direct training on synthetic replacement data in CFG-aware training regime, with classifier-free guidance applied during the inference. This article is an extended version of the NeurIPS 2024 LLM-PC submission, which was awarded second prize.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes