Which Spurious Correlations Impact Reasoning in NLI Models? A Visual Interactive Diagnosis through Data-Constrained Counterfactuals
This work addresses the issue of unreliable model predictions in NLI for researchers and practitioners, though it is incremental as it builds on existing counterfactual and diagnostic methods.
The paper tackled the problem of identifying spurious correlations that affect reasoning in Natural Language Inference (NLI) models by developing a human-in-the-loop dashboard for interactive diagnosis, discovering categories such as Semantic Relevance, Logical Fallacies, and Bias.
We present a human-in-the-loop dashboard tailored to diagnosing potential spurious features that NLI models rely on for predictions. The dashboard enables users to generate diverse and challenging examples by drawing inspiration from GPT-3 suggestions. Additionally, users can receive feedback from a trained NLI model on how challenging the newly created example is and make refinements based on the feedback. Through our investigation, we discover several categories of spurious correlations that impact the reasoning of NLI models, which we group into three categories: Semantic Relevance, Logical Fallacies, and Bias. Based on our findings, we identify and describe various research opportunities, including diversifying training data and assessing NLI models' robustness by creating adversarial test suites.