Transforming Question Answering Datasets Into Natural Language Inference Datasets
This addresses the need for more diverse NLI datasets for researchers in natural language processing, though it is incremental as it builds on existing QA resources.
The paper tackles the problem of limited natural language inference (NLI) datasets by proposing a method to automatically derive NLI datasets from question answering datasets, resulting in a new dataset of over 500k examples that exhibits a wide range of inference phenomena.
Existing datasets for natural language inference (NLI) have propelled research on language understanding. We propose a new method for automatically deriving NLI datasets from the growing abundance of large-scale question answering datasets. Our approach hinges on learning a sentence transformation model which converts question-answer pairs into their declarative forms. Despite being primarily trained on a single QA dataset, we show that it can be successfully applied to a variety of other QA resources. Using this system, we automatically derive a new freely available dataset of over 500k NLI examples (QA-NLI), and show that it exhibits a wide range of inference phenomena rarely seen in previous NLI datasets.