CLJan 11, 2019

EQUATE: A Benchmark Evaluation Framework for Quantitative Reasoning in Natural Language Inference

Abhilasha Ravichander, Aakanksha Naik, Carolyn Rose, Eduard Hovy

arXiv:1901.03735v230.61032 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the need for better evaluation of quantitative reasoning in NLP systems, which is crucial for developing more intelligent language understanding models, though it is incremental as it focuses on benchmarking and baseline improvements.

The authors tackled the problem of evaluating quantitative reasoning in natural language inference by introducing EQUATE, a benchmark framework, and found that state-of-the-art NLI models did not improve over a majority-class baseline, indicating a lack of implicit quantitative reasoning. They established a new baseline Q-REAS, which achieved a +24.2% improvement on numerical reasoning tests but had -8.1% lower verbal reasoning capabilities.

Quantitative reasoning is a higher-order reasoning skill that any intelligent natural language understanding system can reasonably be expected to handle. We present EQUATE (Evaluating Quantitative Understanding Aptitude in Textual Entailment), a new framework for quantitative reasoning in textual entailment. We benchmark the performance of 9 published NLI models on EQUATE, and find that on average, state-of-the-art methods do not achieve an absolute improvement over a majority-class baseline, suggesting that they do not implicitly learn to reason with quantities. We establish a new baseline Q-REAS that manipulates quantities symbolically. In comparison to the best performing NLI model, it achieves success on numerical reasoning tests (+24.2%), but has limited verbal reasoning capabilities (-8.1%). We hope our evaluation framework will support the development of models of quantitative reasoning in language understanding.

View on arXiv PDF Code

Similar