CL LGJun 18, 2025

Learning-Time Encoding Shapes Unlearning in LLMs

Ruihan Wu, Konstantin Garov, Kamalika Chaudhuri

arXiv:2506.15076v12.7h-index: 4

Originality Incremental advance

AI Analysis

This addresses the need for reliable post-hoc unlearning in LLMs for applications like privacy and content correction, but it is incremental as it builds on prior unlearning benchmarks and algorithms.

The paper tackled the problem of how learning-time choices in knowledge encoding affect unlearning factual knowledge in large language models, finding that paraphrased descriptions improve unlearning performance and that unlearning from text chunks is challenging.

As large language models (LLMs) are increasingly deployed in the real world, the ability to ``unlearn'', or remove specific pieces of knowledge post hoc, has become essential for a variety of reasons ranging from privacy regulations to correcting outdated or harmful content. Prior work has proposed unlearning benchmarks and algorithms, and has typically assumed that the training process and the target model are fixed. In this work, we empirically investigate how learning-time choices in knowledge encoding impact the effectiveness of unlearning factual knowledge. Our experiments reveal two key findings: (1) learning with paraphrased descriptions improves unlearning performance and (2) unlearning individual piece of knowledge from a chunk of text is challenging. Our results suggest that learning-time knowledge encoding may play a central role in enabling reliable post-hoc unlearning.

View on arXiv PDF

Similar