CR CLJul 4, 2022

A Customized Text Sanitization Mechanism with Differential Privacy

Huimin Chen, Fengran Mo, Yanhao Wang, Cen Chen, Jian-Yun Nie, Chengyu Wang, Jamie Cui

arXiv:2207.01193v248.8242 citationsh-index: 25Has Code

Originality Incremental advance

AI Analysis

This work addresses privacy issues in NLP for users and developers, offering an incremental improvement over current text sanitization methods.

The paper tackles the problem of text sanitization for privacy in NLP by proposing a new mechanism that works with any similarity measure and provides token-level protection, achieving a better privacy-utility trade-off than existing methods in experiments on benchmark datasets.

As privacy issues are receiving increasing attention within the Natural Language Processing (NLP) community, numerous methods have been proposed to sanitize texts subject to differential privacy. However, the state-of-the-art text sanitization mechanisms based on metric local differential privacy (MLDP) do not apply to non-metric semantic similarity measures and cannot achieve good trade-offs between privacy and utility. To address the above limitations, we propose a novel Customized Text (CusText) sanitization mechanism based on the original $ε$-differential privacy (DP) definition, which is compatible with any similarity measure. Furthermore, CusText assigns each input token a customized output set of tokens to provide more advanced privacy protection at the token level. Extensive experiments on several benchmark datasets show that CusText achieves a better trade-off between privacy and utility than existing mechanisms. The code is available at https://github.com/sai4july/CusText.

View on arXiv PDF Code

Similar