Data Cartography for Detecting Memorization Hotspots and Guiding Data Interventions in Generative Models
This addresses the issue of data leakage and memorization in generative models for AI security and privacy, representing a novel method for a known bottleneck rather than a foundational breakthrough.
The paper tackled the problem of generative models overfitting and memorizing rare training examples, which can be exploited by adversaries. It introduced Generative Data Cartography (GenDataCarto), a framework that reduced synthetic canary extraction success by over 40% with only 10% data pruning while increasing validation perplexity by less than 0.5%.
Modern generative models risk overfitting and unintentionally memorizing rare training examples, which can be extracted by adversaries or inflate benchmark performance. We propose Generative Data Cartography (GenDataCarto), a data-centric framework that assigns each pretraining sample a difficulty score (early-epoch loss) and a memorization score (frequency of ``forget events''), then partitions examples into four quadrants to guide targeted pruning and up-/down-weighting. We prove that our memorization score lower-bounds classical influence under smoothness assumptions and that down-weighting high-memorization hotspots provably decreases the generalization gap via uniform stability bounds. Empirically, GenDataCarto reduces synthetic canary extraction success by over 40\% at just 10\% data pruning, while increasing validation perplexity by less than 0.5\%. These results demonstrate that principled data interventions can dramatically mitigate leakage with minimal cost to generative performance.