CLApr 26, 2024

Quantifying Memorization and Detecting Training Data of Pre-trained Language Models using Japanese Newspaper

arXiv:2404.17143v223 citationsh-index: 3INLG
Originality Synthesis-oriented
AI Analysis

This highlights risks of data leakage in domain-specific PLMs, especially with private data, but is incremental as it extends known English results to Japanese.

The study pre-trained domain-specific GPT-2 models on Japanese newspaper articles and found that memorization correlates with data duplication, model size, and prompt length, replicating English findings, and showed training data can be detected via membership inference attacks.

Dominant pre-trained language models (PLMs) have demonstrated the potential risk of memorizing and outputting the training data. While this concern has been discussed mainly in English, it is also practically important to focus on domain-specific PLMs. In this study, we pre-trained domain-specific GPT-2 models using a limited corpus of Japanese newspaper articles and evaluated their behavior. Experiments replicated the empirical finding that memorization of PLMs is related to the duplication in the training data, model size, and prompt length, in Japanese the same as in previous English studies. Furthermore, we attempted membership inference attacks, demonstrating that the training data can be detected even in Japanese, which is the same trend as in English. The study warns that domain-specific PLMs, sometimes trained with valuable private data, can ''copy and paste'' on a large scale.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes