CLApr 23, 2018

Collecting Diverse Natural Language Inference Problems for Sentence Representation Evaluation

Adam Poliak, Aparajita Haldar, Rachel Rudinger, J. Edward Hu, Ellie Pavlick, Aaron Steven White, Benjamin Van Durme

arXiv:1804.08207v232.91180 citations

Originality Synthesis-oriented

AI Analysis

This provides a resource for researchers to assess how well sentence representations capture distinct reasoning types, though it is incremental as it repurposes existing data.

The authors tackled the problem of evaluating sentence representations by creating a large-scale collection of diverse natural language inference (NLI) datasets, resulting in over half a million labeled pairs from 13 existing datasets recast into a common structure.

We present a large-scale collection of diverse natural language inference (NLI) datasets that help provide insight into how well a sentence representation captures distinct types of reasoning. The collection results from recasting 13 existing datasets from 7 semantic phenomena into a common NLI structure, resulting in over half a million labeled context-hypothesis pairs in total. We refer to our collection as the DNC: Diverse Natural Language Inference Collection. The DNC is available online at https://www.decomp.net, and will grow over time as additional resources are recast and added from novel sources.

View on arXiv PDF

Similar