CLMay 29, 2020

Beyond Leaderboards: A survey of methods for revealing weaknesses in Natural Language Inference data and models

Viktor Schlegel, Goran Nenadic, Riza Batista-Navarro

arXiv:2005.14709v12.418 citations

Originality Synthesis-oriented

AI Analysis

It provides a structured resource for researchers in NLI to improve dataset design and model evaluation, but it is incremental as it synthesizes existing work.

The paper surveys methods for identifying weaknesses in Natural Language Inference (NLI) datasets and models, categorizing reported issues and proposing tools to assess data quality and model capabilities.

Recent years have seen a growing number of publications that analyse Natural Language Inference (NLI) datasets for superficial cues, whether they undermine the complexity of the tasks underlying those datasets and how they impact those models that are optimised and evaluated on this data. This structured survey provides an overview of the evolving research area by categorising reported weaknesses in models and datasets and the methods proposed to reveal and alleviate those weaknesses for the English language. We summarise and discuss the findings and conclude with a set of recommendations for possible future research directions. We hope it will be a useful resource for researchers who propose new datasets, to have a set of tools to assess the suitability and quality of their data to evaluate various phenomena of interest, as well as those who develop novel architectures, to further understand the implications of their improvements with respect to their model's acquired capabilities.

View on arXiv PDF

Similar