CLAILGApr 17, 2025

Memorization: A Close Look at Books

arXiv:2504.12549v21 citationsh-index: 4Proceedings of the First Workshop on Large Language Model Memorization (L2M2)
Originality Incremental advance
AI Analysis

This work highlights vulnerabilities in current memorization mitigation strategies for LLMs, which is a problem for AI safety and privacy, though it is incremental as it builds on existing extraction techniques.

The study investigated the extent to which entire books can be extracted from LLMs, specifically using Llama 3 70B models, and successfully reconstructed 'Alice's Adventures in Wonderland' from the first 500 tokens with high similarity, while extraction rates varied with book popularity.

To what extent can entire books be extracted from LLMs? Using the Llama 3 70B family of models, and the "prefix-prompting" extraction technique, we were able to auto-regressively reconstruct, with a very high level of similarity, one entire book (Alice's Adventures in Wonderland) from just the first 500 tokens. We were also able to obtain high extraction rates on several other books, piece-wise. However, these successes do not extend uniformly to all books. We show that extraction rates of books correlate with book popularity and thus, likely duplication in the training data. We also confirm the undoing of mitigations in the instruction-tuned Llama 3.1, following recent work (Nasr et al., 2025). We further find that this undoing comes from changes to only a tiny fraction of weights concentrated primarily in the lower transformer blocks. Our results provide evidence of the limits of current regurgitation mitigation strategies and introduce a framework for studying how fine-tuning affects the retrieval of verbatim memorization in aligned LLMs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes