CLNov 5, 2022

Privacy-Preserving Models for Legal Natural Language Processing

arXiv:2211.02956v1295 citationsh-index: 24Has Code
Originality Incremental advance
AI Analysis

This addresses privacy concerns for legal NLP practitioners when using pre-trained models on sensitive data, though it is incremental as it applies existing differential privacy methods to a new domain.

The paper tackles the problem of sharing pre-trained transformer models on sensitive legal data without compromising privacy, and shows that using differential privacy in specific training configurations can improve downstream performance on legal NLP tasks without sacrificing privacy protection.

Pre-training large transformer models with in-domain data improves domain adaptation and helps gain performance on the domain-specific downstream tasks. However, sharing models pre-trained on potentially sensitive data is prone to adversarial privacy attacks. In this paper, we asked to which extent we can guarantee privacy of pre-training data and, at the same time, achieve better downstream performance on legal tasks without the need of additional labeled data. We extensively experiment with scalable self-supervised learning of transformer models under the formal paradigm of differential privacy and show that under specific training configurations we can improve downstream performance without sacrifying privacy protection for the in-domain data. Our main contribution is utilizing differential privacy for large-scale pre-training of transformer language models in the legal NLP domain, which, to the best of our knowledge, has not been addressed before.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes