Mitigating Unintended Memorization in Language Models via Alternating Teaching
This addresses privacy risks in language models for users and organizations, but it is incremental as it builds on existing teacher-student frameworks.
The paper tackles the problem of language models memorizing sensitive data by proposing an alternating teaching method using multiple teachers on disjoint datasets, achieving superior privacy-preserving results on LibriSpeech with small utility loss when training records are sufficient.
Recent research has shown that language models have a tendency to memorize rare or unique sequences in the training corpora which can thus leak sensitive attributes of user data. We employ a teacher-student framework and propose a novel approach called alternating teaching to mitigate unintended memorization in sequential modeling. In our method, multiple teachers are trained on disjoint training sets whose privacy one wishes to protect, and teachers' predictions supervise the training of a student model in an alternating manner at each time step. Experiments on LibriSpeech datasets show that the proposed method achieves superior privacy-preserving results than other counterparts. In comparison with no prevention for unintended memorization, the overall utility loss is small when training records are sufficient.