CLOct 17, 2023

Disentangling the Linguistic Competence of Privacy-Preserving BERT

Stefan Arnold, Nils Kemmerzell, Annika Schreiner

arXiv:2310.11363v121.2133 citationsh-index: 2

Originality Synthesis-oriented

AI Analysis

This addresses the problem of privacy-preserving language models for NLP practitioners, but it is incremental as it focuses on analyzing existing distortions rather than proposing a new solution.

The study investigated how differential privacy in text-to-text privatization degrades BERT's performance by analyzing internal representations, finding that it reduces overall similarity and impairs encoding of contextual relationships between words while preserving localized word properties.

Differential Privacy (DP) has been tailored to address the unique challenges of text-to-text privatization. However, text-to-text privatization is known for degrading the performance of language models when trained on perturbed text. Employing a series of interpretation techniques on the internal representations extracted from BERT trained on perturbed pre-text, we intend to disentangle at the linguistic level the distortion induced by differential privacy. Experimental results from a representational similarity analysis indicate that the overall similarity of internal representations is substantially reduced. Using probing tasks to unpack this dissimilarity, we find evidence that text-to-text privatization affects the linguistic competence across several formalisms, encoding localized properties of words while falling short at encoding the contextual relationships between spans of words.

View on arXiv PDF

Similar