CLIRMay 21, 2021

Towards Automatic Comparison of Data Privacy Documents: A Preliminary Experiment on GDPR-like Laws

arXiv:2105.10117v1Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses the time-consuming and language-dependent task for legal experts in evaluating data privacy regulations, though it is incremental as it applies existing NLP methods to a new domain.

The paper tackles the problem of manually comparing GDPR-like data privacy laws across countries by proposing a simple NLP approach that extracts structured information and measures document similarity, achieving the best performance with a BERT model and cosine similarity.

General Data Protection Regulation (GDPR) becomes a standard law for data protection in many countries. Currently, twelve countries adopt the regulation and establish their GDPR-like regulation. However, to evaluate the differences and similarities of these GDPR-like regulations is time-consuming and needs a lot of manual effort from legal experts. Moreover, GDPR-like regulations from different countries are written in their languages leading to a more difficult task since legal experts who know both languages are essential. In this paper, we investigate a simple natural language processing (NLP) approach to tackle the problem. We first extract chunks of information from GDPR-like documents and form structured data from natural language. Next, we use NLP methods to compare documents to measure their similarity. Finally, we manually label a small set of data to evaluate our approach. The empirical result shows that the BERT model with cosine similarity outperforms other baselines. Our data and code are publicly available.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes