LG AIDec 8, 2024

Classifier-free guidance in LLMs Safety

arXiv:2412.06846v11 citationsh-index: 1

Originality Incremental advance

AI Analysis

This addresses safety concerns in LLMs by enabling effective unlearning of harmful content, though it appears incremental as it builds on existing ORPO and classifier-free guidance methods.

The paper tackles the problem of unlearning harmful content from large language models without needing a retaining dataset, achieving significant improvement in unlearning without degrading model performance through a CFG-aware training regime with synthetic replacement data and modified classifier-free guidance during inference.

The paper describes LLM unlearning without a retaining dataset, using the ORPO reinforcement learning method with inference enhanced by modified classifier-free guidance. Significant improvement in unlearning, without degradation of the model, is achieved through direct training on synthetic replacement data in CFG-aware training regime, with classifier-free guidance applied during the inference. This article is an extended version of the NeurIPS 2024 LLM-PC submission, which was awarded second prize.

View on arXiv PDF

Similar