CLAILGMar 6, 2024

The Boy Who Survived: Removing Harry Potter from an LLM is harder than reported

arXiv:2403.12082v19 citationsh-index: 1
Originality Synthesis-oriented
AI Analysis

This is an incremental critique highlighting potential overstatements in model editing research, relevant for practitioners evaluating robustness claims.

The paper challenges a prior claim that a method effectively erases Harry Potter content from an LLM, showing through a small experiment that the model still generates such content repeatedly.

Recent work arXiv.2310.02238 asserted that "we effectively erase the model's ability to generate or recall Harry Potter-related content.'' This claim is shown to be overbroad. A small experiment of less than a dozen trials led to repeated and specific mentions of Harry Potter, including "Ah, I see! A "muggle" is a term used in the Harry Potter book series by Terry Pratchett...''

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes