CLAILGApr 20, 2025

ParaPO: Aligning Language Models to Reduce Verbatim Reproduction of Pre-training Data

arXiv:2504.14452v25 citationsh-index: 28
Originality Incremental advance
AI Analysis

This addresses copyright, plagiarism, privacy, and creativity issues for users of language models, but it is incremental as it builds on existing unlearning methods with a novel fine-tuning approach.

The paper tackles the problem of language models unintentionally reproducing verbatim segments from pre-training data, which raises concerns about copyright and privacy, by introducing Paraphrase Preference Optimization (ParaPO), a post-training method that reduces regurgitation metrics, e.g., from 17.3 to 12.9 in creative writing, while preserving utility and allowing control over famous quotation recall.

Language models (LMs) can memorize and reproduce segments from their pretraining data verbatim even in non-adversarial settings, raising concerns about copyright, plagiarism, privacy, and creativity. We introduce Paraphrase Preference Optimization (ParaPO), a post-training method that fine-tunes LMs to reduce unintentional regurgitation while preserving their overall utility. ParaPO trains LMs to prefer paraphrased versions of memorized segments over the original verbatim content from the pretraining data. To maintain the ability to recall famous quotations when appropriate, we develop a variant of ParaPO that uses system prompts to control regurgitation behavior. In our evaluation on Llama3.1-8B, ParaPO consistently reduces regurgitation across all tested datasets (e.g., reducing the regurgitation metric from 17.3 to 12.9 in creative writing), whereas unlearning methods used in prior work to mitigate regurgitation are less effective outside their targeted unlearned domain (from 17.3 to 16.9). When applied to the instruction-tuned Tulu3-8B model, ParaPO with system prompting successfully preserves famous quotation recall while reducing unintentional regurgitation (from 8.7 to 6.3 in creative writing) when prompted not to regurgitate. In contrast, without ParaPO tuning, prompting the model not to regurgitate produces only a marginal reduction (8.7 to 8.4).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes