CLOct 6, 2020

PolicyQA: A Reading Comprehension Dataset for Privacy Policies

Wasi Uddin Ahmad, Jianfeng Chi, Yuan Tian, Kai-Wei Chang

arXiv:2010.02557v131.31002 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This dataset helps users and developers by enabling more efficient information retrieval from privacy policies, though it is incremental as it builds on existing QA frameworks for a specific domain.

The authors introduced PolicyQA, a dataset of 25,017 reading comprehension examples from 115 privacy policies, to address the challenge of extracting short text spans from verbose documents, and evaluated it with neural QA models to highlight its advantages and challenges.

Privacy policy documents are long and verbose. A question answering (QA) system can assist users in finding the information that is relevant and important to them. Prior studies in this domain frame the QA task as retrieving the most relevant text segment or a list of sentences from the policy document given a question. On the contrary, we argue that providing users with a short text span from policy documents reduces the burden of searching the target information from a lengthy text segment. In this paper, we present PolicyQA, a dataset that contains 25,017 reading comprehension style examples curated from an existing corpus of 115 website privacy policies. PolicyQA provides 714 human-annotated questions written for a wide range of privacy practices. We evaluate two existing neural QA models and perform rigorous analysis to reveal the advantages and challenges offered by PolicyQA.

View on arXiv PDF Code

Similar