CLAug 20, 2024

Inside the Black Box: Detecting Data Leakage in Pre-trained Language Encoders

Yuan Xin, Zheng Li, Ning Yu, Dingfan Chen, Mario Fritz, Michael Backes, Yang Zhang

arXiv:2408.11046v11.92 citationsh-index: 58

Originality Incremental advance

AI Analysis

This addresses privacy and copyright concerns for users and developers of NLP models, though it is incremental as it builds on existing risk assessments by focusing on a specific overlooked aspect.

The paper tackles the problem of data leakage in pre-trained language encoders, revealing through experiments across multiple architectures and tasks that membership leakage exists even with black-box access to downstream models, indicating a greater privacy risk than previously thought.

Despite being prevalent in the general field of Natural Language Processing (NLP), pre-trained language models inherently carry privacy and copyright concerns due to their nature of training on large-scale web-scraped data. In this paper, we pioneer a systematic exploration of such risks associated with pre-trained language encoders, specifically focusing on the membership leakage of pre-training data exposed through downstream models adapted from pre-trained language encoders-an aspect largely overlooked in existing literature. Our study encompasses comprehensive experiments across four types of pre-trained encoder architectures, three representative downstream tasks, and five benchmark datasets. Intriguingly, our evaluations reveal, for the first time, the existence of membership leakage even when only the black-box output of the downstream model is exposed, highlighting a privacy risk far greater than previously assumed. Alongside, we present in-depth analysis and insights toward guiding future researchers and practitioners in addressing the privacy considerations in developing pre-trained language models.

View on arXiv PDF

Similar