Towards Automatic Comparison of Data Privacy Documents: A Preliminary Experiment on GDPR-like Laws
This work addresses the time-consuming and language-dependent task for legal experts in evaluating data privacy regulations, though it is incremental as it applies existing NLP methods to a new domain.
The paper tackles the problem of manually comparing GDPR-like data privacy laws across countries by proposing a simple NLP approach that extracts structured information and measures document similarity, achieving the best performance with a BERT model and cosine similarity.
General Data Protection Regulation (GDPR) becomes a standard law for data protection in many countries. Currently, twelve countries adopt the regulation and establish their GDPR-like regulation. However, to evaluate the differences and similarities of these GDPR-like regulations is time-consuming and needs a lot of manual effort from legal experts. Moreover, GDPR-like regulations from different countries are written in their languages leading to a more difficult task since legal experts who know both languages are essential. In this paper, we investigate a simple natural language processing (NLP) approach to tackle the problem. We first extract chunks of information from GDPR-like documents and form structured data from natural language. Next, we use NLP methods to compare documents to measure their similarity. Finally, we manually label a small set of data to evaluate our approach. The empirical result shows that the BERT model with cosine similarity outperforms other baselines. Our data and code are publicly available.