CL CRFeb 24, 2022

How reparametrization trick broke differentially-private text representation learning

arXiv:2202.12138v232.0639 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses a critical flaw in privacy-preserving NLP methods, highlighting pitfalls for researchers applying differential privacy to text data.

The paper reveals that several recent NLP papers claiming differential privacy in text representation learning actually have false privacy guarantees, and provides an empirical sanity check to detect such violations.

As privacy gains traction in the NLP community, researchers have started adopting various approaches to privacy-preserving methods. One of the favorite privacy frameworks, differential privacy (DP), is perhaps the most compelling thanks to its fundamental theoretical guarantees. Despite the apparent simplicity of the general concept of differential privacy, it seems non-trivial to get it right when applying it to NLP. In this short paper, we formally analyze several recent NLP papers proposing text representation learning using DPText (Beigi et al., 2019a,b; Alnasser et al., 2021; Beigi et al., 2021) and reveal their false claims of being differentially private. Furthermore, we also show a simple yet general empirical sanity check to determine whether a given implementation of a DP mechanism almost certainly violates the privacy loss guarantees. Our main goal is to raise awareness and help the community understand potential pitfalls of applying differential privacy to text representation learning.

View on arXiv PDF Code

Similar