CLNov 3, 2019

Question Answering for Privacy Policies: Combining Computational and Legal Perspectives

Abhilasha Ravichander, Alan W Black, Shomir Wilson, Thomas Norton, Norman Sadeh

arXiv:1911.00841v130.61024 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge for users in comprehending complex privacy policies, though it is incremental as it primarily introduces a new dataset.

The authors tackled the problem of understanding privacy policies by creating PrivacyQA, a corpus of 1750 questions and 3500 expert annotations, and found that a neural baseline underperformed human performance by nearly 0.3 F1, indicating significant room for improvement.

Privacy policies are long and complex documents that are difficult for users to read and understand, and yet, they have legal effects on how user data is collected, managed and used. Ideally, we would like to empower users to inform themselves about issues that matter to them, and enable them to selectively explore those issues. We present PrivacyQA, a corpus consisting of 1750 questions about the privacy policies of mobile applications, and over 3500 expert annotations of relevant answers. We observe that a strong neural baseline underperforms human performance by almost 0.3 F1 on PrivacyQA, suggesting considerable room for improvement for future systems. Further, we use this dataset to shed light on challenges to question answerability, with domain-general implications for any question answering system. The PrivacyQA corpus offers a challenging corpus for question answering, with genuine real-world utility.

View on arXiv PDF Code

Similar