CLAIOct 29, 2025

The Limits of Obliviate: Evaluating Unlearning in LLMs via Stimulus-Knowledge Entanglement-Behavior Framework

arXiv:2510.25732v1
Originality Incremental advance
AI Analysis

This addresses the challenge of managing sensitive data and correcting misinformation in LLMs, providing a framework for assessing unlearning, but it is incremental as it builds on existing theories and focuses on evaluation rather than a new unlearning method.

The paper tackled the problem of evaluating unlearning effectiveness in large language models (LLMs) by investigating whether persuasive prompting can recall factual knowledge from unlearned models, finding that it substantially enhances recall (e.g., from 14.8% baseline to 24.5% with authority framing) with effectiveness inversely correlated to model size (e.g., 128% recovery in 2.7B vs. 15% in 13B).

Unlearning in large language models (LLMs) is crucial for managing sensitive data and correcting misinformation, yet evaluating its effectiveness remains an open problem. We investigate whether persuasive prompting can recall factual knowledge from deliberately unlearned LLMs across models ranging from 2.7B to 13B parameters (OPT-2.7B, LLaMA-2-7B, LLaMA-3.1-8B, LLaMA-2-13B). Drawing from ACT-R and Hebbian theory (spreading activation theories), as well as communication principles, we introduce Stimulus-Knowledge Entanglement-Behavior Framework (SKeB), which models information entanglement via domain graphs and tests whether factual recall in unlearned models is correlated with persuasive framing. We develop entanglement metrics to quantify knowledge activation patterns and evaluate factuality, non-factuality, and hallucination in outputs. Our results show persuasive prompts substantially enhance factual knowledge recall (14.8% baseline vs. 24.5% with authority framing), with effectiveness inversely correlated to model size (128% recovery in 2.7B vs. 15% in 13B). SKeB provides a foundation for assessing unlearning completeness, robustness, and overall behavior in LLMs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes